Monday, July 25, 2011

Native vs Grapevine Reputation on MTurk

The Mechanical Turk blog has a new entry today, by Sharon (Chiarella), titled "Cooking with Sharon" & Tip #3 Manage Your Reputation.

In the article, Sharon encourages requesters to do the following:
  • Pay well - Don’t be fooled into underpaying Workers by comparing your HITs to low priced HITs that aren’t being completed.
  • Pay fairly – Don’t reject an Assignment unless you’re SURE it’s the Worker who is wrong.
  • Pay quickly – If you approve or reject Assignments once a week, Workers may do a few HITs and then wait to see if they are paid before doing more. This is especially true if you’re a new Requester and haven’t established your reputation yet.
Sharon then explains that workers do talk with each other in the forums, on Turkopticon, and so on, and collectively establish the reputation of the requester based on these factors. While there is nothing wrong with this "grapevine"-based reputation, it also illustrates some obvious features that the Mechanical Turk platform is missing.

Instead of outsourcing the task to third-party forums, Amazon should provide features that make the reputation of the requester more transparent, visible, and objective.

For example, each requester could have a profile, in which the workers can see:
  • The total number of HITs, and rewards posted by the requester
  • The rejection rate for the requester
  • The distribution of working time for the HITs of the requester
  • The effective hourly wage for the tasks completed for the requester
  • The payment lag from completion of the task until payment
These are all elements that workers would find useful. They are statistics that contribute to the transparency of the market, and their objective nature makes the establishment of reputation much faster. Such objective characteristics are complementing the more subjective features used in the the grapevine-based reputation systems (Turker Nation, Turkopticon, etc), where only a subset of workers contribute and measure personal perceptions (e.g., was this task "well-paid" or not?). Of course, subjective reputation systems will continue to play their role, providing information that cannot be easily quantified. But they should not be the only reputation signal for the market.

Could there be side-effects if such a system is deployed? Yes. I can see some cases where this profile can introduce strange incentives in the market. (For example, it may be good to have a few of my tasks spammed and still pay immediately for the results, so that I can have high acceptance rate, HITs that require only a little bit of time to be completed, and show a high hourly wage.) But these are just details that can be addressed. There is no way that overall the market could suffer when such statistics become publicly available. (Sorry Mr \$0.23/hr-requester, you are not that valuable.)

Markets operate based on trust and are better with increased information efficiency. Any step towards this direction is a good step for the market participants and, by extension, for the market owner.

Friday, July 22, 2011

A tale about parking

The media attention to my prior blog post was really not something that I enjoyed. Not so much for the attention itself but for focusing on exactly the wrong issues. That post was NOT about me and my evaluation. This is not the main point. I thought that the salary issue was worth mentioning (apparently, it was not) but it was, indeed, a MINOR part of the issue.

In fact, after reflecting on this point, I realized the following: Even if I had received a $1M bonus from NYU for my efforts, the basic problem would still be there: the teaching experience would degenerate into a witch hunt, focusing on cheating, instead of being about learning. And yes, I would still write the same blog post even if I were fully satisfied with my annual evaluation. In fact, the blog post was in my folder of draft posts for a few months now, long before receiving my annual evaluation.

If you want a a parallel, consider this hypothetical story:



A tale about parking

Suppose that you live in a city with a huge traffic problem, and a resulting huge parking problem. Too many cars on the street.

People try to find parking and they drive around, drive around. A lot. Some drivers get frustrated and they double park. Some drivers are stupid enough to double park during rush hour, block the traffic, and leave the car unattended. As expected, the police arrives and assigns a ticket to the offender, sometimes taking the car as well. However, during quiet hours, when there is no traffic many drivers double park, but they do not block the traffic, and nobody gives them a ticket.

Suddenly, in one neighborhood only, call it Redwich Village, a lone policeman starts assigning tickets for every parking violation. No matter if it is minor or major. No matter if the driver just stepped out, or if it is the first time that the driver double parked. Zero-tolerance policy.

By doing that, and being more vigilant, our lone policeman assigns 10 times more tickets that before. By doing that, he also lost countless hours fighting with the offenders. This continuous fight, also annoys some other residents of the neighborhood that want the policeman to focus on policing the neighborhood, and not spend all his time giving parking tickets.

But even our lone policeman gets frustrated: he realizes that he did not become a policeman to give parking tickets. While it is part of his duties, he feels that it is just better not to be so aggressive. His boss also gets a report that many neighborhood residents are annoyed. His boss knows that the complaints are due to the zero-tolerance policy on parking tickets. So he says that he would like our lone policeman to both continue this idiosyncratic zero-tolerance policy enforced just by our lone policeman, and be as diligent with his other duties as before.

Our lone policeman goes on and reflects on the overall experience. He realizes that he is fighting a losing battle. As the number of cars increase in the city, there will be more people parking illegally.

So, our lone policeman suggests that we need to do something more fundamental about the parking problem: He suggests that people could carpool, use bicycles, mass transit, or simply walk. And he asks for people to think of more such alternatives. If there are less cars in the city, the problem will be resolved.

He describes all his thoughts in his blog, in a long post, titled "Why I will never give parking tickets again." He describes the futility of parking tickets to fight the underlying problem, and vows never to be so vigilant about parking tickets. He will be as vigilant as all the other policemen, which is as vigilant as he was before.

His blog post goes viral. Media pick up fragments, everyone reads whatever they want to read. Some headlines:
  • "Parking tickets in Redwich Village increase by 1000%. Is it impossible to park your car in Redwich?"
  • "Parking-related violations skyrocket in Redwich Village. Policeman punished for enforcing the rules."
  • "RedWich Village sucks. Only scumbags live in RedWich Village, what did you expect? Any lawful behavior?"
  • "Stupid city residents: We know that all people that live in cities are cheaters and park illegally"
  • "Why the government does not reward this honest policeman?"
  • "Why this policeman is vowing not to obey the law? Oh the society..."
Now, some of the business owners of Redwich Village are annoyed because people may not drive to Redwich, if they think it is impossible to find parking. Some residents are also annoyed because real estate prices may go down if people believe that Redwich is a place where you cannot park your car. After all, it is all a matter of reputation.

And in this bruhaha, nobody pays any attention to the underlying problem. Is increased vigilance the solution to the parking problem? Should we give more tickets? Should we install cameras? Or should we try to follow the suggestions of our lone policeman and think of other ways to reduce traffic, and therefore resolve the parking problem on a more fundamental level?

The blog post of our lone policeman is neither about the policeman nor about Redwich. It is about the fact that there is too much traffic in the whole city. Which in turn causes the parking problem. Parking scarcity is the symptom, not the real problem. And while he wrote about the traffic problem and suggested solutions, 99% of the coverage was about Rewich and about his own evaluation.



This is exactly how the discussion about cheating evolved in the media. Instead of focusing on how to make student evaluation objective and cheating-proof, the discussion focused on whether my salary went sufficiently up or not. This is not the main point. It is not even a minor point, in reflection. The real question is on how we can best evaluate our students and which evaluation strategies are robust to cheating, encourage creativity, and evaluate true learning.

And this is not a discussion that can be done while screaming.

Sunday, July 17, 2011

Why I will never pursue cheating again

Update

You can read my commentary in my new blog post: A tale about parking.

The discussion on Hacker News was good as well. Also see the response I posted on the Business Insider website and the coverage in Inside Higher Education.


============================================================
TL;DR: Cheating is not a 'bad apple' problem when incentives and assessment design make it cheap and low-risk. Detection tools help, but the scalable fix is redesign.



Last Fall, it was my first semester of teaching as a tenured professor. It was also the semester that I realized how pervasive cheating is in our courses. After spending a tremendous amount of time fighting and pursuing all the cheating cases, I decided that it makes no sense to fight it. The incentive structures simply do not reward such efforts. The Nash equilibrium is to let the students cheat and "perform well"; in exchange, I get back outstanding evaluations. Fighting cheating cannot happen through policing. We need to consider alternative approaches to evaluating students that are structurally cheating-proof.

But let me give you the complete story, as it contains tidbits that I found, in retrospect, highly entertaining.

Clarification 1: Before you jump to conclusions, though, that I just gave up, please go to the end of the article, at the "Future" section, and read my final thoughts.

Clarification 2: The point of this blog post is not to show that "business students cheat" or that our own university is anyhow different than others. I have no reason to believe that other institutions suffer any less from cheating. If you want to think that it is NYU, Stern, or business schools, the only place where cheating happens, then you are turning a blind eye to the problem. The fact that nobody is putting significant effort into detecting or combating cheating does not mean cheating does not exist. The main point I want to make is that cheating happens because we, structurally, put the right incentives in place for it to occur. I propose a pedagogically correct solution. And I welcome comments and feedback.



How it all started: Tenure and Turnitin Integration

There were two new things in the Fall 2010 semester:

First, it was my first semester teaching as a tenured faculty member. This allowed me to be more relaxed and stricter on things related to cheating.

Second, for the first time, our Blackboard installation had full integration with Turnitin. For those unfamiliar with these, Blackboard is a course management system, and Turnitin is a plagiarism-detection software. The integration meant that when students submitted assignments, the uploaded documents were automatically processed by Turnitin to produce originality reports.

Turnitin has a vast database of assignments (all submitted assignments are added to its database) and also checks the Internet to identify parts of the assignment that may be copied from a website. For those curious about the technicalities, detection occurs by checking for unusual n-grams that appear in two or more documents. For essay-based assignments, you can be assured that Turnitin will detect most cases of plagiarism.

So, given the ease of deployment, I decided to use Turnitin for the first time. I uploaded all my past assignments to Turnitin from prior semesters and configured Blackboard to automatically submit all new assignments through Turnitin.



First assignment out: Essay about WiMax, LTE, and the future of wireless communications

The first assignment of the semester asked students to study the technologies for "4G" wireless data transfer and to understand how the wireless carriers' choice of underlying technologies can affect their strategies. To make the assignment different from the one distributed last year, I also added LTE questions, in addition to the WiMax questions we were using before.

The assignments came back, and here is how the Turnitin report looked:


Yep. 20 assignments appeared to have more than 20% plagiarized content. Some were false positives, but most actually contained plagiarized content.

Trying to understand what is going on, I studied the reports in detail. Here is how one assignment looked, with the highlighted parts indicating parts that have been copied from other Internet sources (e.g., bbtantenna.com, moopz.com, and so on):



This student created a report by using three buttons in his keyboard: Find site on Internet, copy, paste; Find site on Internet, copy, paste; Find site on Internet, copy, paste. Although it was not a blatant case of cheating, it demonstrated an alarming practice. Students get used to preparing reports by simply looking things up on the Internet and then just pasting everything together, with minimal further editing. Even more alarming: no citations to the original sources.

I decided not to punish the students who engaged in this practice, but I had to discuss in class at length why it is a nasty habit. This is not "research" as some students call it. Plagiarism is habitual and can have dire consequences for one's professional life. A quick check on the news of that week revealed two articles about such a type of plagiarism:


Not sure if the message came across, but I tried to educate the class about what plagiarism is. Some of the students actually protested that I did not punish this behavior (they felt they had been educated enough about it in the past). Still, I decided to be lenient, since it was just a couple of cases like that. In retrospect, I was being stupid. At least one of these students cheated again in a later assignment.



The blatant cheaters

But what I considered a deep problem was not this copy-and-paste behavior. At least these students were learning how to find information online, which was admittedly relevant. With a bit of practice in properly citing their sources and some effort, these issues could be resolved. The deep problem was with students who were really cheating.

Here is the report for one offender, with 95% of the content copied from a student who took the class in Fall 2009:


There were other similar cases, but this was the most extensive. 95% of the assignment was copied word-for-word.

The student, after receiving the notification that the assignment was processed by Turnitin (but without knowing whether it was marked as plagiarized), sent me the following, highly entertaining email (emphasis is mine):

Sorry for the confusion but the assignment which I handed in online was not the correct assignment. I was away for the weekend and wrote my homework on a different lab top not my own and when finished emailed it to myself. Yesterday after class i heard the news that my best friends grandmother had a stroke and was in the hospital and i went there to help out. Then i remembered that i had the homework to hand in. I asked my roommate to turn in the work for me. Since it was not written in correct program he had to transfer into a word documentI asked someone who had already done the assignment to send him theirs to he could format my answers into the correct format. In this process he accidentally copied the other persons work into document and not mine. The only way i realized is when i looked at the Turnitin receipt and saw it was not mine. Attached is my correct work and i am sorry for the confusion.

You cannot blame the student for lack of creativity in the excuse, can you?

What are the chances of the given excuse being true? Well, let's see another page of the report:


Everything was indeed cut and pasted from an old homework, but (surprise!) the number "2009" from the old assignment was changed to "2010". I am wondering what the OS was on his "laptop" that could do such a bright copy and paste.



Blatant cheating, attempt #2

I decided to run the newly submitted assignment through Turnitin. I could not really believe that he would try cheating again. What the heck, let's submit the assignment to Turnitin. I tried anyway. Here is what came back:


Yep, the "revised" assignment was actually 57% copied from a Fall 2009 assignment. And from which one? From the very same assignment from which the student copied to start with! You cannot make this stuff up.

At that point, I had to suspend him from the class and refer him to the honorary council for further punishment. If not being punished for plagiarism, the student should have been penalized for just being stupid.



The class announcement: "Who cheated? "

For processing the remaining cases, I decided not to confront students directly: the case above took about 3 hours of my time to get the student to admit what he had done, despite the overwhelming evidence.

Instead, I sent an email to the class. I just said that plagiarism was detected, and whoever cheated could come find me. For the rest, I would report the case to the Dean's office, provide the evidence, and let them decide what to do and whether to pursue the case.

The result? Many more students than I was expecting were waiting outside my office during office hours. While nobody was willing to admit wrongdoing, most of them readily accepted that "took a look at an assignment of my roommate, or "got some help from my fraternity brothers," and so on. Of course, Turnitin allowed me to easily find the name of the person who "helped" them. At that point, most students just gave up and admitted that they copied.

One interesting observation: Cheating clustered in tight social networks. Not just among international students "borrowing" from their compatriots (we do not have that many in the undergrad program), but also among US-born students. A result of socializing in similar student groups? Same fraternities and sororities? I do not have enough data points to make statistical claims, but the pattern seemed very strong.



Excel-based assignment: The party continues

A few weeks later, I posted an assignment requiring students to perform Excel-based analysis. To make it easier to detect cheaters, I added some extra features that would make it difficult to just copy and paste from another assignment. (Font choice, re-sizing random cells in non-visible parts of Excel, defining variables with slightly different names, and many other small tricks.) I also modified past assignments by slightly changing the required formulas, and by adjusting the parameter values in very slight ways (e.g., from price = 10.467, I used price = 10.468, and in Excel, rounded up to 2 digits, both showed up at 10.47)

When the results came back, it was a big mess. First, students submitted Excel spreadsheets containing their classmates' names. Or the authors' names of past PhD students, who prepared solution keys in 2006. (And which have the incorrect solution as well.) It was also obvious to detect students who used layouts from past solutions, as some of them did not even remove the border formatting from the Excel cells. (Yes, if you double underline cells E5 to E9, and use a Garamond font just for that part of the assignment, there is a strong suspicion that you copied and pasted the solution from 2008, which had exactly these characteristics.)

One of the offenders was actually a repeat offender from the prior assignment and was also dismissed from the class.

Another student had a nervous breakdown in my office, crying loudly and uncontrollably for 2 hours. It was awkward. On the one hand, I wanted to prevent the student from being embarrassed, and I tried to close my office door. On the other hand, I did not even want to think of being in my office behind closed doors with an undergraduate student who is crying loudly.

A complete and utter mess...



The wasted time

By the end of the semester, 22 students admitted to cheating out of the 108 enrolled in the class.

The process of discussing all the detected cases was not only painful but also extremely time-consuming.

Students would come to my office and deny everything. Then I would present the evidence to them. They would soften but continue to deny it. Only when I was saying, "Enough, I will just give the case to the honorary council, who will decide," did most students admit wrongdoing. But every case was at least 2 hours of wasted time.

With 22 cases, that was a lot of time devoted to cheating: More than 45 hours in completely unproductive discussions, when the total lecture time for the course was just 32 hours. This is simply too much time.



The overall experience

When 1 out of 5 students in the class is involved in a cheating case, the lectures and class discussions become awkward. For the rest of the semester, there was a palpable sense of anxiety in class. Instead of having friendly talks, the discussions became contentious. Not a pleasant environment.

This, of course, had a direct effect on my teaching evaluations. Instead of the usual assessments in the range of 6.0 to 6.5 out of 7, this time my ratings went down by almost a point: 5.3 out of 7.0. Instead of being in the upper percentiles as a teacher, I was now below average.




Will I do it again in the future?

Was it worth it? Absolutely not. Emotionally, pedagogically, and financially, it was a bad decision to be so vigilant.

The usual mode of catching and punishing only the egregious cases was much better. Why pursue only the cases where there is evidence of cheating? Yes, I was able to generate proofs for all the instances that I sought, but at what cost!

I also did not like the overall teaching experience, and this was the most important thing for me. Teaching became annoying and tiring. There was a very different dynamic in class, which I did not particularly enjoy. It was a feeling of "me-against-them," rather than the much more pleasant "these things that we are learning are really cool!"

Adding insult to the injury, my yearly evaluation came back with my first "average" rating for my annual performance as a professor. And, together with the "average" rating, I received the lowest salary increase I have ever received. (It was actually below inflation, so effectively it was a salary decrease.) Not that it would matter much if my salary increase were higher, but it would signify that there is some incentive to actively pursue cheating cases. The lousy increase was the nail in the coffin.

Will I pursue cheating cases in the future? Never, ever again!




The future: How to deal with cheating?

So, how to deal with cheating in the future?

I doubt that I will be checking again for cheaters. First, this is a losing battle: as I use more advanced cheating detection schemes, the cheaters will adapt. I am not a policeman fighting crime. My role is to educate, not to enforce honest behavior. This is a university, not a kindergarten. Second, when a couple of students cheat, they have a problem. When 22 students cheat, well, the problem is mine!

Suggestions to completely change the assignments from year to year are appealing at first glance, but they create other problems: it is tough to know in advance whether an assignment will be too easy, too hard, or too ambiguous. Even small-scale testing with TA's and other faculty does not help. You need to "test" the new assignment by giving it to students. If it is a good one, you want to keep it. If it is a bad one, you just gave the students a useless exercise.

What I came to realize is that the assignments' style was an inherent part of the problem. The solution is not to detect cheating. The solution is to create assignments that are inherently not amenable to cheating:
  • Public projects: The database projects that use NYC Data Mine data (see the projects from 2009 and 2010) are one approach: they are public, and it would be meaningless to copy a project from a past semester. The risk of public embarrassment is a significant deterrent.
  • Peer reviewing: The other successful project is one in which students research a new technology and present their findings in class; the only grade they receive comes from their peers. The social pressure is so high that most of the presentations are of excellent quality. This year, the student presentation on augmented reality was so impressive that, for an MBA class, we simply showed the recorded presentation to the MBA students.
  • Competitions: To teach students how the web works, I ask them to create a website and attract at least 100 unique visitors. The student with the most visitors at the end of the semester receives an award (most often an iPod). I had some great results with this project (e.g., one student created a website on "How to Kill Nefarian" and got 150,000 visitors over 8 weeks) and some highly entertaining incidents.

These types of assignments work well for specific types of problems. I am still not sure how I can teach students, for example, to write database queries, without some "boring," well-defined assignments with pre-determined "correct" outcomes.

In other words, my theory is that cheating (on a systematic level) happens because students try to gain an edge over their peers/competitors. Even top-notch students cheat to ensure a perfect grade. Fighting cheating is not something that professors can do well in the long run, and it is counterproductive by itself. By channeling this competitive energy into creative activities, in which you cannot cheat, everyone is better off.

Any other suggestions are greatly appreciated. I am interested in what others are doing to deal with the problem.