Tuesday, November 15, 2011

Does lack of reputation help the crowdsourcing industry?

Can the lack of a public reputation system on Amazon Mechanical Turk be the reason behind the success of current crowdsourcing companies? I present an analysis that points to this direction. Unfortunately, this "feature" also leads to a stagnating crowdsourcing market with limited potential for growing.

Low salaries and market for lemons

A contentious issue about crowdsourcing, and specifically about Amazon Mechanical Turk, is that wages are very low. It is not uncommon to see effective wages of \$1/hr, or even lower. Why is that?

I have argued in the past that Mechanical Turk is an example of a "market for lemons". Good workers are drowning in the anonymity of the crowd. Since the good workers cannot differentiate themselves from bad workers before working on a task, they are doomed to receive the same level of compensation as the bad workers.

This is not a fault of the employers: when a new employer joins the market, it is almost necessary for the employer to test the incoming workers to ensure the quality of the work. During this testing period, high-quality workers are completing the tasks side-by-side with low-quality workers, and everyone receives a low salary.

The counter argument that I often hear is: "But the market, in the long run, should see an increase in salaries, as good workers demonstrate their quality to employers". Of course, in the long run we are all dead. But even at the long run, and even after we are all dead, the market does not seem to be on a path to convergence to fair salaries.

Why? Here is the brief summary:

  • High-quality workers are much more valuable than low-quality ones
  • Lack of a shared reputation system depresses salaries pushing all salaries close to the level of low-quality workers
  • Employers build their own, private reputation systems, learning the quality of the workers
  • With the private quality information, employers can retain good workers by paying higher wages compared to the low-quality workers, but still lower than their "fair" quality-adjusted wage.
  • New employers cannot compete with incumbents since they do not have access to the privately built reputation systems and have to face the cost of learning the quality of the workers, while incumbents enjoy their advantage of already knowing who the good workers are
  • Incumbents can enjoy a strong cost advantage, effectively blocking newcomers from entering the industry
Below I expand these arguments in a little bit more higher level of detail.

Quality equivalence of low- and high-quality workers

First, let's examine the differences in payment between high- and low-quality workers. Let's take a very simple setting: Suppose that you have workers performing a task with two answers: Yes or no. The low quality are accurate $lq$% of the time. The high-quality workers are accurate $hq$% of the time. How many workers of low quality do we need to emulate one worker of high quality?

Working in the simplest possible case, assume that we have we have $k$ low-quality workers, and each gives with probability $q$ the correct answer. We take the majority vote to be the the aggregate answer. What is the probability $P(q,k)$ that the the majority will be correct? We have that:

$P(q,k) = \sum_{i = \lceil \frac{k+1}{2} \rceil}^k \binom{k}{i} \cdot q^i \cdot(1-q)^{k-i}$

(Assume, for the sake of simplicity that $k$ is odd. Otherwise, we need to add the term
$\frac{1}{2}\cdot \left( \lceil \frac{k+1}{2} \rceil - \lceil \frac{k}{2} \rceil \right) \cdot \binom{k}{k/2}\cdot q^{k/2}\cdot (1-q)^{k/2}$ in the above equation, to allocate ties appropriately)

Given the above, we can find how many low-quality workers of quality $lq$ we need to emulate a single high-quality worker of quality $hq$: We just need to solve the equation:

$P(lq, k) = P(hq, 1)$

Here are a few indicative pairs: To reach the 95% quality level we need:
  • 3 workers of quality 90%.
  • 7 workers of quality 80%.
  • 9 workers of quality 75%.
  • 15 workers of quality 70%.
  • 67 workers of quality 60%.
  • 269 workers of quality 55%.
If our goal is to reach the 99% quality level, we need:
  • 3 workers of quality 95%
  • 5 workers of quality 90%
  • 13 workers of quality 80%
  • 31 workers of quality 70%
This means that the fair wage of a single worker that is accurate at the 95% quality level should be ~9 times higher than the wage of the worker who is 75% accurate. A worker who is 99% accurate should demand 13x higher salary than someone who is 80% accurate. Notice that as the quality of the low-quality workers drops, the difference in fair wages between the high-quality and low-quality increases in a very fast rate.

Employers learning the quality of workers

Suppose that we have an employer called PanosLabs that has worked for a long period of time with workers. At this point, PanosLabs has a long track record for many workers, and the quality estimates for each worker are pretty solid.

Now, this knowledge of worker quality allows PanosLabs to pay the good workers higher salaries. Let's assume that PanosLabs decided to be very "generous". For the high-quality 99%-accurate workers, PanosLabs quadruples the salary, compared to the general pool. Similarly, for workers that are 95%-accurate, PanosLabs triples the salary compared to the general pool.

Assuming that the general pool of workers is at the 80% accuracy level, PanosLabs gets the following bargain: It is now possible to cut costs significantly, while maintaining the same quality level.

Initially, PanosLabs was hiring 13 workers per case, paying each \$1/hr; this is an effective wage of $13/hr for reaching the 99% quality level. Now, PanosLabs can have the 99% quality level by just employing a single 99% worker, for the cost of \$4/hr. This is a cost reduction of 70%!

Great bargain eh? This is the benefit of knowing thy worker...

Increasing the barriers to entry

Now let's assume that a new employer, called RotisLabs arrives at the market. The high-quality workers are now happily employed at PanosLabs, receiving a salary that is 4X the running market salary for their task.

RotisLabs coming to the crowdsourcing market, is in a pickle. RotisLabs has no way of identifying and attracting the high quality workers without attracting the workers to work for RotisLabs first. Why?
  • There is no history of employment. In the "real world" knowing that an engineer worked at, say, Google gives some signal of quality. In our setting RotisLabs cannot check if a worker has worked for PanosLabs.
  • It is not possible to check how much the workers get paid for other tasks. In the "real world" prices serve as signals. An employee that gets a high salary also signals to other employers that is a high performer. However, RotisLabs cannot check the prices that workers receive.
Check now the situation of RotisLabs: The competitor, PanosLabs, generates 99% accurate work at the cost of \$4/hr. What are the options of RotisLabs?
  • First option: RotisLabs can pay \$1/hr. This option attracts the following workers: The low-quality, 80%-accurate workers that did not get increases by PanosLabs, and, if lucky, some new 99%-accurate workers that just arrived in the market. However, this pay rate does not attract the high-quality workers that stick with PanosLabs, severely limiting the pool of good workers accessible to RotisLabs. Notice that, at this pay level, RotisLabs has a cost of \$13/hr to reach the 99%-quality level, while competing with PanosLabs that has 70% lower cost of production, i.e., \$4/hr. If RotisLabs has enough cash and patience, will stick to the market until learning the quality of workers. In most cases, though, RotisLabs will just realize that it is not possible to compete.
  • Second option: RotisLabs can pay \$4/hr. This option may attract the 99%-accurate workers that work for PanosLabs. But this will also attract the 80% workers! Our dear friend, RotisLabs, cannot separate the two. Therefore, to ensure the 99%-quality level, RotisLabs needs to still hire 13 workers per case, to account for the cases where many 80% workers work on an example. This increases the overall cost of production at \$52/hr. Ooops! PanosLabs can reach the same level of quality with a cost of just \$4/hr.
You can see that knowing the quality of the workers can give a tremendous benefit to the incumbent players that invest into learning the quality of the workers.

Interestingly enough, due to the depressed salaries that is a direct consequence of the lack of reputation systems, the established employer effectively passed the search costs to the employees: While learning the quality of the workers, the employer is paying salaries corresponding to the lowest expected level of quality. It is up to the workers to carry the burden of low salaries until proving themselves (again and again, for every single employer...)

Lack of shared reputation system: The foundation of the crowdsourcing industry?

The lack of a (shared) reputation system is a godsend for companies that enjoy a first movers advantage. They can keep their costs down, while keeping their own employers happy, (in a relative sense: "cant you see how much better I am paying you compared to the general pool?").

The anonymity generates the conditions for "market of lemons" salaries, which keep the costs down. At the same time, the smart and established employers can find and reach out to the high quality workers. By paying these workers "generously", the smart employers can lock-in the workers into "golden cages": offer salaries that are higher than those for the general population, but still much much lower than the level of the fair wages for the produced quality levels.

When even these 4x or 5x (unrealistic and fictional) salary increases, mentioned in the example above, are great bargains, you can imagine the margins that crowdsourcing companies can command.

In a very perverse manner, the anonymity imposed by Mechanical Turk is now effectively serving as the foundation of the current crowdsourcing industry. The anonymity keeps worker costs down, allowing most companies to offer solutions that are very cost competitive compared to alternatives. At the same time, this policy is hurting the Amazon MTurk marketplace by effectively generating huge barriers to entry for newcomer employers, and depressing the salaries of newcomer employees. (The Masters qualification is a step in the right direction, but too crude to serve as an effective signalling mechanism.)

The future?

Let's see who will manage to generate the appropriate market for crowdsourcing that will resolve these issues. One thing is clear: the direction towards improving crowdsourcing markets requires salaries to increase significantly. Interestingly enough, this is expected to lower the overall cost of production as well, as the cost of quality control will be significantly lower.

As I said in the crowdsourcing panel at the WWW2011 conference last Spring:
  • It is not about the cost!
  • It is not about the crowd!
  • It is not about simple tasks!
  • Crowdsourcing is best for “parallel, scalable, automatic interviews” and for finding quickly good workers
  • Find the best trained workers, fast, pay them well, and keep them!