In online settings, such inequalities are often amplified. For Wikipedia we have the 1% rule, where 1% of the contributors (this is 0.003% of the users) contribute two thirds of the content. In the Causes application on Facebook, there are 25 million users, but only 1% of them contribute a donation.
So, adapting this question for Mechanical Turk, we want to see: What is the distribution of activity across requesters?
The Activity Distribution: The (Insignificance of the) Long Tail of Requesters
To analyze the level of participation, for the XRDS paper, we took the requesters that posted a task on Mechanical Turk from January 2009 until April 2010, and we ranked them according to the total reward amount of the posted HITs. Then, we measured what percentage of the rewards comes from the top the requesters in the market. Here is the resulting plot:
Indeed, the result shows that Mechanical Turk is closer to the "1% rule" of Wikipedia, than to the general 80-20 principle. As in Wikipedia, the top 1% of the requesters, contribute two thirds of the activity in the market.
By reading the graph, we see the following:
- Castingwords, the top requester across the 10K requesters in the dataset, accounts for 10% of the dollar-weighted activity (!).
- The top 0.1% of the requesters (i.e., the top-10 requesters) account for 30% of the dollar-weighted activity.
- The top 1% of the requesters account for 60% of the dollar-weighted activity.
- The top 10% of the requesters account for 90% of the dollar-weighted activity.
- The long tail of the 90% of the requesters is effectively insignificant.
- The average level of posted rewards is $58. This corresponds to an average level of activity of just four dollars per month.
- The median is just $1.60. Yes, this is not a typo: 1.6 dollars. In other words, 50% of the requesters never post more than a couple of dollars worth of tasks.
- Only a small fraction of requesters (less than 1%) posted 1000 dollars worth of tasks or more over the period from January 2009 till April 2010.
I would like to try is to check if this model indeed corresponds to reality. Do we see a geometric growth in activity as the requester stays in the market for longer? Do we observe "deaths" of requesters? (The Fader-Hardie model may be a nice, simple model to try.) What is the expected future activity of a requester?
Such questions may be useful for guiding decisions of workers when deciding whether to invest time and effort to get a good reputation for a given requester (e.g., by completing qualification tests or completing the basic HITs that unlock access to the "protected" HITs.)