Showing posts with label incentives. Show all posts
Showing posts with label incentives. Show all posts

Friday, July 22, 2011

A tale about parking

The media attention to my prior blog post was really not something that I enjoyed. Not so much for the attention itself but for focusing on exactly the wrong issues. That post was NOT about me and my evaluation. This is not the main point. I thought that the salary issue was worth mentioning (apparently, it was not) but it was, indeed, a MINOR part of the issue.

In fact, after reflecting on this point, I realized the following: Even if I had received a $1M bonus from NYU for my efforts, the basic problem would still be there: the teaching experience would degenerate into a witch hunt, focusing on cheating, instead of being about learning. And yes, I would still write the same blog post even if I were fully satisfied with my annual evaluation. In fact, the blog post was in my folder of draft posts for a few months now, long before receiving my annual evaluation.

If you want a a parallel, consider this hypothetical story:



A tale about parking

Suppose that you live in a city with a huge traffic problem, and a resulting huge parking problem. Too many cars on the street.

People try to find parking and they drive around, drive around. A lot. Some drivers get frustrated and they double park. Some drivers are stupid enough to double park during rush hour, block the traffic, and leave the car unattended. As expected, the police arrives and assigns a ticket to the offender, sometimes taking the car as well. However, during quiet hours, when there is no traffic many drivers double park, but they do not block the traffic, and nobody gives them a ticket.

Suddenly, in one neighborhood only, call it Redwich Village, a lone policeman starts assigning tickets for every parking violation. No matter if it is minor or major. No matter if the driver just stepped out, or if it is the first time that the driver double parked. Zero-tolerance policy.

By doing that, and being more vigilant, our lone policeman assigns 10 times more tickets that before. By doing that, he also lost countless hours fighting with the offenders. This continuous fight, also annoys some other residents of the neighborhood that want the policeman to focus on policing the neighborhood, and not spend all his time giving parking tickets.

But even our lone policeman gets frustrated: he realizes that he did not become a policeman to give parking tickets. While it is part of his duties, he feels that it is just better not to be so aggressive. His boss also gets a report that many neighborhood residents are annoyed. His boss knows that the complaints are due to the zero-tolerance policy on parking tickets. So he says that he would like our lone policeman to both continue this idiosyncratic zero-tolerance policy enforced just by our lone policeman, and be as diligent with his other duties as before.

Our lone policeman goes on and reflects on the overall experience. He realizes that he is fighting a losing battle. As the number of cars increase in the city, there will be more people parking illegally.

So, our lone policeman suggests that we need to do something more fundamental about the parking problem: He suggests that people could carpool, use bicycles, mass transit, or simply walk. And he asks for people to think of more such alternatives. If there are less cars in the city, the problem will be resolved.

He describes all his thoughts in his blog, in a long post, titled "Why I will never give parking tickets again." He describes the futility of parking tickets to fight the underlying problem, and vows never to be so vigilant about parking tickets. He will be as vigilant as all the other policemen, which is as vigilant as he was before.

His blog post goes viral. Media pick up fragments, everyone reads whatever they want to read. Some headlines:
  • "Parking tickets in Redwich Village increase by 1000%. Is it impossible to park your car in Redwich?"
  • "Parking-related violations skyrocket in Redwich Village. Policeman punished for enforcing the rules."
  • "RedWich Village sucks. Only scumbags live in RedWich Village, what did you expect? Any lawful behavior?"
  • "Stupid city residents: We know that all people that live in cities are cheaters and park illegally"
  • "Why the government does not reward this honest policeman?"
  • "Why this policeman is vowing not to obey the law? Oh the society..."
Now, some of the business owners of Redwich Village are annoyed because people may not drive to Redwich, if they think it is impossible to find parking. Some residents are also annoyed because real estate prices may go down if people believe that Redwich is a place where you cannot park your car. After all, it is all a matter of reputation.

And in this bruhaha, nobody pays any attention to the underlying problem. Is increased vigilance the solution to the parking problem? Should we give more tickets? Should we install cameras? Or should we try to follow the suggestions of our lone policeman and think of other ways to reduce traffic, and therefore resolve the parking problem on a more fundamental level?

The blog post of our lone policeman is neither about the policeman nor about Redwich. It is about the fact that there is too much traffic in the whole city. Which in turn causes the parking problem. Parking scarcity is the symptom, not the real problem. And while he wrote about the traffic problem and suggested solutions, 99% of the coverage was about Rewich and about his own evaluation.



This is exactly how the discussion about cheating evolved in the media. Instead of focusing on how to make student evaluation objective and cheating-proof, the discussion focused on whether my salary went sufficiently up or not. This is not the main point. It is not even a minor point, in reflection. The real question is on how we can best evaluate our students and which evaluation strategies are robust to cheating, encourage creativity, and evaluate true learning.

And this is not a discussion that can be done while screaming.

Sunday, July 17, 2011

Why I will never pursue cheating again

The post is temporarily removed. I will restore it after ensuring that there are no legal liabilities for myself or my employer.

Until then, you can read my commentary in my new blog post: A tale about parking.

The discussion on Hacker News was good as well. Also see the response that I posted at the Business Insider website and the coverage at Inside Higher Education.

Friday, May 13, 2011

Pay Enough or Don't Pay at All

No good deed goes unpunished

A while back, we have been working with Dahn Tamir on identifying spam tasks and requesters on the Mechanical Turk platform. Dahn took the lead and build a task on MTurk in which Turkers could see the (other) newly posted tasks on MTurk, and flag the obvious spam ones. Since this was not a task from which he could benefit, he asked workers to rate as many tasks as possible without submitting the task as "completed", to keep the costs down. Workers were happy to collaborate, and effectively work for free, in order to clear the market. We were collecting data nicely.

And then, I received some minimal funding for the project ($1,000 to be exact). At that point, I thought that it would be a nice gesture to actually start paying the workers. So, we created a new task, we calibrated for time to pay around 7 dollars an hour, and we posted the task. We were expecting workers to be happy. They were doing the work for free before; now they would not only help clean the market, but they would also get paid for this!

The result? A few positive messages with a thank-you note. But also a big backlash: "You, fat cat academic, with all the grants, you want us to work for peanuts?". "Hey, big prof, would you like to be paid minimum wage for your work?". "Yeah, we should be the slaves doing all the grunge work for your research, so that you can get the fame."

I was shocked. What happened? I tried to remind the workers that they were doing the same task for free before, but it did not really make a difference. Actually, it made things worse.



Market norms vs. social norms

Then, I remembered. Dan Ariely, in this book "Predictably Irrational" has warned about this. There are the social norms and the market norms. When no money is involved, the exchanges operate using social norms. Once you put a price on a task, it becomes part of a market norm. It can be measured and compared.

When the workers were not getting paid, they were working towards a noble goal: Clean the market from the spammers. By putting a price on the task of classifying spam tasks, we essentially told the workers how much we value their work: minimum wage. Instead of offering their priceless help, they were being valued as unskilled workers, like every other worker in the market. Money and altruism do not mix.



Somebody must have studied that before

Needless to say, examining the influence of money on performance and motivation is not a new topic. A wonderful paper that deals is the "Pay Enough or Don't Pay at All" by Gneezy and Rustichini, published back in 2000, titled (625 citations so far, according to Google Scholar). Instead of trying to describe the paper myself, I will just list here the succinct abstract:

Economists usually assume that monetary incentives improve performance, and psychologists claim that the opposite may happen. We present and discuss a set of experiments designed to test these contrasting claims. We found that the effect of monetary compensation on performance was not monotonic. In the treatments in which money was offered, a larger amount yielded a higher performance. However, offering money did not always produce an improvement: subjects who were offered monetary incentives performed more poorly than those who were offered no compensation. Several possible interpretations of the results are discussed.

I would encourage anyone to read the paper, as it contains extensive discussion of various models and explanations. I will definitely do no justice if I claim that I covered fully the content of the paper here. However, I would like to highlight some parts below.

Gneezy and Rustichini extended research in psychology from the 1970's, which examined the difference between intrinsic and extrinsic motivation.

Psychologists study behavior modification through conditioning (in the case of the behaviorist school) or learning (for the cognitive school). We do not. To illustrate the difference, we may consider the classic experiment reported in Deci [1971]. He had college students play with a puzzle in three successive sessions. In the first session participants were left to play freely. In the second session subjects in one group received payment if they solved the puzzle, while the control group did not. In a third session the subjects were again left to play freely. The amount of time spent on free activity in the first and third session was taken as a measure of intrinsic motivation. Deci found that in the third session the experimental group spent less time than the control group playing with the puzzle, and he concluded that the reward offered had decreased the intrinsic motivation of subjects in the first group over the three sessions.

That was a result from research in the 1970's. Gneezy and Rustichini wanted to also examine the effect of money in non-sequential environments. So, they conducted the following experiments:

Effect of additional payment on a paid task

At the beginning of the experiment, each student was promised a fixed payment of NIS 60 for participation. (NIS = New Israeli Shekel, at the time of the experiment, 3.5 NIS = $1.) They were then told that the experiment would take 45 minutes, and they would be asked to answer a quiz consisting of 50 problems taken out of a psychometric test used to scan applicants to the university. [...] In the four different treatments subjects were promised different additional payments for each correct answer.
  • In the first group no mention was made of any additional payment.
  • In the second group subjects were promised an additional payment of 0.1 NIS per question answered correctly.
  • In the third group subjects were promised an additional payment of 1 NIS per question answered correctly and
  • In the fourth group subjects were promised an additional payment of 3 NIS per question answered correctly
[...] The average number of correct answers was:
  • 28.4 in the first group (no additional payment)
  • 23.1 in the second group (additional 0.1 NIS per correct answer).
  • 34.7 in the third group, (additional 1 NIS per correct answer). 
  • 34.1 in the fourth group, (additional 3 NIS per correct answer). 
In other words, performance-based payment improved performance. But offering just a small additional financial incentive, actually decreased performance compared to the case of providing no financial incentives.

Effect of payment on unpaid tasks

We had 180 high-school students around the age of 16 participating with three treatment levels [collecting money for a charitable purpose]....
  • In the first treatment, the students were told about the importance of collecting money for the society, that the results of the collection would be published, so that the amount collected by each pair would become public knowledge. 
  • In the second treatment, after the same speech, each pair was promised 1 percent of the amount that the two of them collected. 
  • In the third treatment, each pair was promised 10 percent of the amount they collected.
In the second and third treatments it was made clear that the payment was made from funds additional to the donation, provided by the researchers. The average amount collected was:
  • 238.67 for groups in the first treatment (with no payment).
  • 153.67 in the second group (1 percent of the collected amount).
  • 219.33 in the third group (10 percent of the collected amount).
In this case, where there was no initial payment and the task had an altruistic purpose, providing financial incentives actually reduced performance.


Additional literature

There is significant literature for anyone interested (thanks Sonny!). A few pointers to start:





Conclusions

Essentially, Gneezy and Rustichini found that:
  • Paying more indeed increases performance, compared to paying less. 
  • However, paying nothing may actually be better than paying!
Section IV of the paper has a very nice discussion on how to interpret and model the process. Here are a few explanations in increasing power of explaining the observed phenomenon:
  • Paying something removes the intrinsic motivation for a task, and replaces it with the external motivation for money.
  • Incomplete contract: the piece-wise or performance-based payment changes the original meaning of the contract, which implied that high-performance is part of the task.
  • Paying small amounts compared to the originally implied value of the task devalues the task (e.g., take back a glass bottle to help recycling vs. for getting 5 cents back)


Relevance to crowdsourcing 

I found the results pretty interesting, with significant implications for micro-crowdsourcing. While volunteers may be great for various tasks (e.g., in citizen science applications, such as the Galaxy Zoo), migrating such applications to a paid crowdsourcing application may have a significant downside. Paying small rewards to workers will be counterproductive. The work of volunteers is, indeed, priceless.

Furthermore, with the low level of payments on Mechanical Turk, we are stuck at the worst possible status. We pay, and we do not pay enough. But how can we pay more, when every attempt to increase the price to reasonable levels is followed by attempts of scammers to game the system and get paid for doing nothing?

Monday, April 20, 2009

Google App Engine and Java: First Impressions

Over the last few days I have been playing with Google App Engine, the infrastructure provided by Google for building applications in the cloud. To give some context, I tried to build a crawler that will retrieve and store historic information from a marketplace. 

I have already built this application  and it was running on my local Linux machine, storing information into a SQL database. However, I was getting unconfortable seeing the database growing significantly and running big queries was interfering with other users who was using the database machine for their own projects. So, I decided to see how easy is to port such a vanilla project into the cloud.

My impressions so far:

Ease of programming

It was pretty easy to follow the provided tutorials and get a basic application up and running pretty soon. It may be a good way to introduce students to (web) programming. The Eclipse plugin hides very significant fraction of the complexity, and allows the programmer to focus on the application development.

Database support

No SQL database anymore. While we still save "entities" and there is support for relationships across entities, Google data storage is based on BigTable, not on a SQL database. This means no joins. You can always implement your own version of a join but this is not how the Google datastore is supposed to be used. Slowly you realize that denormalization is desirable and often absolutely necessary. For someone like me that likes a fully normalized scheme, making sure that we do not have inconsistencies anywhere, it felt almost too messy: too much information replicated everywhere, need to be extra careful not to have anomalies, and so on. I can see a significant learning curve for migrating databases into such an environment. Giving up joins is not easy... (Our MBA students who keep all their data in a single spreadsheet, will feel right at home ;-) But it is not that bad. Personally, it helped me to consider the entities in the Google datastore as materialized views of some underlying relations, and use lazy updating techniques to keep the data consistent.

30-second limit

By far the most annoying aspect of the Google App Engine is the limit of 30 seconds execution time for any process. Nothing can run for more than 30 seconds. Since I wanted to build a crawler, I had to re-think the infrastructure. It was necessary to break the task into smaller chunks that can be completed within the 30 second limit. 

To achieve this, I buily a "task queue" structure that was keeping track of the pages that need to be fetched, and this queue was stored as a persistent structure in the datastore. Then, the "crawler" process was picking URLs from the queue, and was fetching whatever pages can be fetched within the 30 second limit, storing the retrieved pages to the Google datastore. Pretty annoying the fact that the 30 second limitation also includes the time to fetch the page. Often, I was timing out just because the remote server was slow to send the requested page. 

Finally, to get the crawler running "all the time", I scheduled a cron job that was starting the "30-second crawler" process every minute. Almost like trying to travel a trip with a car that can run for every 30 seconds at a time, and can be restarted every minute. Not very elegant, not the most efficient, but it works for lightweight tasks.

Quota system

Google App Engine allows applications to run for free, as long as they stay below some usage quota. Once the app exceeds its daily allocated free quota, it gets billed, up to a maximum specified limit. 

In other words you pay for CPU usage. This is in direct contrast to Amazon EC2 that charges by the "wall time" a virtual machine is running. Since Google App Engine charges only for the actually consumed resources it encourages code that is as efficient as possible and spends as few resources as possible.

Artists say that the limitations of a medium are a major force for creativity. I have to say that the quota system has the same effect. I found myself thinking and rethinking of how I can make the process as efficient as possible. Since I actually see all the time the exact amount of resources spent for each process, I am compelled to make the processes as efficient as possible. This is not the case for regular desktop programming. OK, it takes 2 seconds instead of 0.1. So what? I have plenty of resources, and I can afford being sloppy. When I am being billed for the consumed resources, I have a pretty immediate incentive to write the best code possible. 

I may be overreaching here, but I see the concept of being billed according to CPU usage a force that will encourage deeper learning in Computer Science. The effect of optimization is immediate, measurable, and it is often necessary to optimize, just to get your process running.

I remember the stories of the old-timers and how they were trying to super-optimize their code, so that the mainframe can execute the code overnight and they can get the results back. Well, the mainframe is back!


Tuesday, February 5, 2008

Mechanical Turk, Charity, and Incentives

We were discussing in class today how to provide the appropriate incentives for users to participate in systems that rely on the "wisdom of the crowds". We have seen that small monetary incentives are not really a good motivation for people to join, but fun, or a sense of contribution are better sources of motivation.

Then, the topic of Mechanical Turk came up, and there was some discussion on whether the existing monetary incentives are the best possible. One idea, by Amichai Lesser, was to allow Turkers to send their rewards directly to a charity, instead of accumulating insignificant amounts of money. Given that a large number of people on Mechanical Turk complete tasks for fun and not for money, the idea of allowing users to convert their participation into direct contributions to a charity seemed like a no-brainer.

Amazon, will you implement this? :-)