Tuesday, July 27, 2010

Mechanical Turk, Low Wages, and the Market for Lemons

In HCOMP this year, one of the memorable and discussed presentations (although highly unconventional) was by M. Six Silberman who discussed the "Sellers' problems in human computation markets". The basic question: can we protect the workers there from exploitation and from sweatshop salaries?

Luis von Ahn posted a similar post on his blog. In the comments of the blog post, someone suggested that the low wages on Mechanical Turk is simply the result of high supply of workers and low demand for their work. As there is more supply, the salaries drop. And having minimum wages, would interfere with the free market.

I actually disagree with this interpretation. First of all, there is no oversupply of labor on Mechanical Turk. The distribution of completion times (follows a power law), suggests that the market operates at maximum capacity. My gut instinct actually tells me that there are not enough workers available for the posted work, not vice versa.

I can hear the protests: If there is not enough supply of workers, why don't requesters simply increase the offered prices?

My explanation: The requesters already pay minimum wages for work that is worth minimum wage. How is that possible given the effective hourly rate of \$2/hour?

The basic problem: Spammers. Given that many large tasks attract spammers, most requesters rely on redundancy to ensure quality. So instead of having a single worker to do a task, they get 5 workers to work on it. This increases the effective rate from \$2/hr to \$10/hr.

Effectively, what Amazon Mechanical Turk is today is a market for lemons, following the terminology of Akerlof's famous paper, for which he got the 2001 Nobel prize.

A market for lemons is a market where the sellers cannot evaluate beforehand the quality of the goods that they are buying. So, if you have two types of products (say good workers and low quality workers) and cannot tell who is whom, the price that the buyer is willing to pay will be proportional to the average quality of the worker. So the offered price will be between the price of a good worker and a low quality worker. What a good worker would do? Given that good workers will not get enough payment for their true quality, they leave the market. This leads the buyer to lower the price even more towards the price for low quality workers. At the end, we only have low quality workers in the market (or workers willing to work for similar wages) and the offered price reflects that.

This is exactly what is happening on Mechanical Turk today. Requesters pay everyone as if they are low quality workers, assuming that extra quality assurance techniques will be required on top of Mechanical Turk.

So, how can someone resolve such issues? The basic solution is the concept of signalling. Good workers need a method to signal to the buyer their higher quality. In this way, they can differentiate themselves from low quality workers. Unfortunately, Amazon has not implemented a good reputation mechanism. The "number of HITs worked" and the "acceptance percentage" are simply not sufficient signalling mechanisms.

Here are some ideas:

  • Allowing workers to get endorsements from reputable requesters (to avoid scam rings like on eBay)
  • Allowing requesters to post machine readable feedback on the performance of the workers, disconnecting evaluation from the approval rate.
  • Certifications and qualification tests that indeed measure ability on different tasks (e.g., language abilities, reading comprehension tests, etc)
  • Publishing the reputation history of the workers, so that requesters can evaluate the quality of the worker.
Of course, similar measures can be adopted for requesters! There is a symmetric market for lemons on that side! Scam requesters post HITs, behave badly, and cause good workers to avoid any newcomer. New requesters then get only low quality workers, get disappointed with the quality of the results and they leave the market.

In other words, Amazon can only gain by taking the time to build a more robust reputation system on top of Mechanical Turk. Trust is at the very core of marketplaces. If Mechanical Turk wants to "grow up", then a good reputation system for both sides of the market is grossly overdue.

15 comments:

  1. Building a good reputation system is itself a big challenge. For example, allowing negative feedback can lead to extortion: http://www.schneier.com/blog/archives/2009/11/virtual_mafia_i.html

    ReplyDelete
  2. There are countless lessons learnt from the deployment of real-life reputation systems over the last decade (eBay, Amazon, yelp, ...).

    Amazon has already a reputation system in place for its secondary marketplace. I am sure they have the experience to do something better than the current setup.

    ReplyDelete
  3. Great post. I've found that the HITs I've built as a requester have improved in quality as I've worked to make them easier to understand based on analyzing how people were stumbling with earlier versions. In cases like this, I blame myself for not being clear enough about what I was looking for.

    Mturk does allow for qualification tests. And workers can be moved into pools as high quality workers are discovered. However, the higher levels of worker grouping and management require use of their API rather than their web interface, which probably is too high of a bar for many requesters.

    Also, creating a pool of workers is only helpful if you can create a large enough pool to get the turnaround times you're looking for on tasks.

    ReplyDelete
  4. One other thing. The requester review process is complete crap. There are really poor filters for workers to help them block spammy requesters, filter for common spammy assignments (fill out this survey), etc.

    If it was easier for workers to find HITs worth doing, the quality of the workforce would likely improve.

    ReplyDelete
  5. Great post. Sounds like biz school's turning you into an economist!

    1. I found (with relatively small jobs), that while there was a power-law completion time, the constant was basically anything I wanted it to be. It just always took the tail longer to finish. Part of that was fast spammers.

    2. Part of the problem with reputation for workers is that most requesters aren't sophisticated enough to ding workers for bad work, because they can't evaluate it well enough in a timely fashion.

    3. Companies like Microtask provide an alternative to Amazon with service level agreements for time and quality.

    4. In the tasks we've done and others we've evaluated, there's a huge distribution in worker ability. It's not just spammer/non-spammer -- there's a huge difference in utility between someone with an 80% accuracy and a 95% accuracy on a task.

    ReplyDelete
  6. Panos,



    Some of those many lessons you mentioned are now written down in one place: http://buildingreputation.com and http://buildingreptuation.com/dokuwiki.php

    [It's a available in book form as well: Building Web Reputation Systems]

    I've seen many "tests" done on several cheap crowdsourcing tools - you're right about the quality of the pool vs. pricing for those without robust reputation systems behind them.

    Randy Farmer
    Co-Author Building Web Reputation Systems

    ReplyDelete
  7. Randy, great to see you dropping by this blog. I have your book in my office. :-)

    ReplyDelete
  8. Why ignore the obvious? Workers are working for so little in a market with a glut of employers because even if and when there is higher paying work available, the workers have no way of finding it.

    As Chilton et al. point out, the interface is absolutely terrible, leading not only to a 30x difference in completion times, but a 5x difference in reward for the same tasks posted at the same time.

    ReplyDelete
  9. Nice post! I object to the oversupply argument for another reason: the money isn't the only motivation for many Turkers. It's probably not even the primary motivation for a sizable chunk. It's hard to understand wages without weighing them against things things like fun, interest, killing time, providing a sense of purpose, etc., especially when the wages can be so small.

    I also wanted to point out that Turk middle-ware companies like Dolores Labs/CrowdFlower are already working with reputation-based solutions. Although I believe it's mostly internal reputation, not public-facing...

    --Judd

    ReplyDelete
  10. Very nice post! Speaking of reputation mechanisms, the Stack Overflow website (www.stackoverflow.com) may be a nice example to examine. With the many different badges in the Stack Overflow system, they seem to have a pretty good reputation system.

    ReplyDelete
  11. This is an excellent article. I started "turking" about a week ago in my spare time and have made ~25 dollars so far. A better designed reputation/qualification system is exactly what is needed. If requesters knew they could get high quality work done they could easily pay more for it.

    ReplyDelete
  12. Speaking of signalling, one of my freelancer friends did so by refusing to work for anything less than x$ (about twice what other people with similar portfolios were charging). When he was questioned about this by a curious client, he promised that he would deliver a certain quality about the average low-quality worker on the website, above and beyond the reqs. He got the job and - after satisfying his client - another year of work after that at the high rates. In such a scenario, strangely, charging high prices might work better!

    ReplyDelete
  13. Couldn't it also be that the "turk" workforce is partially made up of people from low-wage countries, thus accepting to work for less thus pushing the prices down? I wrote about it in a blog post.
    http://thebigsoftwareblog.blogspot.com/2010/09/interesting-post-on-aws-mechanical-turk.html

    ReplyDelete
  14. You can see the demographics of the workers at http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html

    ReplyDelete
  15. > allowing negative feedback can lead to extortion

    If it were easy everyone (including Amazon) would already have found a way.

    ReplyDelete