Friday, May 13, 2011

Pay Enough or Don't Pay at All

No good deed goes unpunished

A while back, we have been working with Dahn Tamir on identifying spam tasks and requesters on the Mechanical Turk platform. Dahn took the lead and build a task on MTurk in which Turkers could see the (other) newly posted tasks on MTurk, and flag the obvious spam ones. Since this was not a task from which he could benefit, he asked workers to rate as many tasks as possible without submitting the task as "completed", to keep the costs down. Workers were happy to collaborate, and effectively work for free, in order to clear the market. We were collecting data nicely.

And then, I received some minimal funding for the project ($1,000 to be exact). At that point, I thought that it would be a nice gesture to actually start paying the workers. So, we created a new task, we calibrated for time to pay around 7 dollars an hour, and we posted the task. We were expecting workers to be happy. They were doing the work for free before; now they would not only help clean the market, but they would also get paid for this!

The result? A few positive messages with a thank-you note. But also a big backlash: "You, fat cat academic, with all the grants, you want us to work for peanuts?". "Hey, big prof, would you like to be paid minimum wage for your work?". "Yeah, we should be the slaves doing all the grunge work for your research, so that you can get the fame."

I was shocked. What happened? I tried to remind the workers that they were doing the same task for free before, but it did not really make a difference. Actually, it made things worse.



Market norms vs. social norms

Then, I remembered. Dan Ariely, in this book "Predictably Irrational" has warned about this. There are the social norms and the market norms. When no money is involved, the exchanges operate using social norms. Once you put a price on a task, it becomes part of a market norm. It can be measured and compared.

When the workers were not getting paid, they were working towards a noble goal: Clean the market from the spammers. By putting a price on the task of classifying spam tasks, we essentially told the workers how much we value their work: minimum wage. Instead of offering their priceless help, they were being valued as unskilled workers, like every other worker in the market. Money and altruism do not mix.



Somebody must have studied that before

Needless to say, examining the influence of money on performance and motivation is not a new topic. A wonderful paper that deals is the "Pay Enough or Don't Pay at All" by Gneezy and Rustichini, published back in 2000, titled (625 citations so far, according to Google Scholar). Instead of trying to describe the paper myself, I will just list here the succinct abstract:

Economists usually assume that monetary incentives improve performance, and psychologists claim that the opposite may happen. We present and discuss a set of experiments designed to test these contrasting claims. We found that the effect of monetary compensation on performance was not monotonic. In the treatments in which money was offered, a larger amount yielded a higher performance. However, offering money did not always produce an improvement: subjects who were offered monetary incentives performed more poorly than those who were offered no compensation. Several possible interpretations of the results are discussed.

I would encourage anyone to read the paper, as it contains extensive discussion of various models and explanations. I will definitely do no justice if I claim that I covered fully the content of the paper here. However, I would like to highlight some parts below.

Gneezy and Rustichini extended research in psychology from the 1970's, which examined the difference between intrinsic and extrinsic motivation.

Psychologists study behavior modification through conditioning (in the case of the behaviorist school) or learning (for the cognitive school). We do not. To illustrate the difference, we may consider the classic experiment reported in Deci [1971]. He had college students play with a puzzle in three successive sessions. In the first session participants were left to play freely. In the second session subjects in one group received payment if they solved the puzzle, while the control group did not. In a third session the subjects were again left to play freely. The amount of time spent on free activity in the first and third session was taken as a measure of intrinsic motivation. Deci found that in the third session the experimental group spent less time than the control group playing with the puzzle, and he concluded that the reward offered had decreased the intrinsic motivation of subjects in the first group over the three sessions.

That was a result from research in the 1970's. Gneezy and Rustichini wanted to also examine the effect of money in non-sequential environments. So, they conducted the following experiments:

Effect of additional payment on a paid task

At the beginning of the experiment, each student was promised a fixed payment of NIS 60 for participation. (NIS = New Israeli Shekel, at the time of the experiment, 3.5 NIS = $1.) They were then told that the experiment would take 45 minutes, and they would be asked to answer a quiz consisting of 50 problems taken out of a psychometric test used to scan applicants to the university. [...] In the four different treatments subjects were promised different additional payments for each correct answer.
  • In the first group no mention was made of any additional payment.
  • In the second group subjects were promised an additional payment of 0.1 NIS per question answered correctly.
  • In the third group subjects were promised an additional payment of 1 NIS per question answered correctly and
  • In the fourth group subjects were promised an additional payment of 3 NIS per question answered correctly
[...] The average number of correct answers was:
  • 28.4 in the first group (no additional payment)
  • 23.1 in the second group (additional 0.1 NIS per correct answer).
  • 34.7 in the third group, (additional 1 NIS per correct answer). 
  • 34.1 in the fourth group, (additional 3 NIS per correct answer). 
In other words, performance-based payment improved performance. But offering just a small additional financial incentive, actually decreased performance compared to the case of providing no financial incentives.

Effect of payment on unpaid tasks

We had 180 high-school students around the age of 16 participating with three treatment levels [collecting money for a charitable purpose]....
  • In the first treatment, the students were told about the importance of collecting money for the society, that the results of the collection would be published, so that the amount collected by each pair would become public knowledge. 
  • In the second treatment, after the same speech, each pair was promised 1 percent of the amount that the two of them collected. 
  • In the third treatment, each pair was promised 10 percent of the amount they collected.
In the second and third treatments it was made clear that the payment was made from funds additional to the donation, provided by the researchers. The average amount collected was:
  • 238.67 for groups in the first treatment (with no payment).
  • 153.67 in the second group (1 percent of the collected amount).
  • 219.33 in the third group (10 percent of the collected amount).
In this case, where there was no initial payment and the task had an altruistic purpose, providing financial incentives actually reduced performance.


Additional literature

There is significant literature for anyone interested (thanks Sonny!). A few pointers to start:





Conclusions

Essentially, Gneezy and Rustichini found that:
  • Paying more indeed increases performance, compared to paying less. 
  • However, paying nothing may actually be better than paying!
Section IV of the paper has a very nice discussion on how to interpret and model the process. Here are a few explanations in increasing power of explaining the observed phenomenon:
  • Paying something removes the intrinsic motivation for a task, and replaces it with the external motivation for money.
  • Incomplete contract: the piece-wise or performance-based payment changes the original meaning of the contract, which implied that high-performance is part of the task.
  • Paying small amounts compared to the originally implied value of the task devalues the task (e.g., take back a glass bottle to help recycling vs. for getting 5 cents back)


Relevance to crowdsourcing 

I found the results pretty interesting, with significant implications for micro-crowdsourcing. While volunteers may be great for various tasks (e.g., in citizen science applications, such as the Galaxy Zoo), migrating such applications to a paid crowdsourcing application may have a significant downside. Paying small rewards to workers will be counterproductive. The work of volunteers is, indeed, priceless.

Furthermore, with the low level of payments on Mechanical Turk, we are stuck at the worst possible status. We pay, and we do not pay enough. But how can we pay more, when every attempt to increase the price to reasonable levels is followed by attempts of scammers to game the system and get paid for doing nothing?