Wednesday, March 31, 2010

Getting High Quality Results on MTurk

In a previous post, an anonymous commenter left a pretty interesting comment, which I think is worth repeating:

On the subject of higher pay not rendering better work, there are some AMT requesters who pay well, and in the instructions for the task, put in bold print a note that says something along the lines of:

"We pay very well for these tasks, because we expect perfect, error-free work. You don't have to rush through these tasks to make money doing them, so slow down and take your time. Poor quality work WILL be rejected and you WILL be blocked from working for us in the future if your work is poor."

My gut feeling, as a task worker at AMT, is that this is an effective method for increasing the quality of the work (high pay, plus a message reminding the worker to take their time because they don't HAVE to rush through these tasks to make money as an incentive, plus the motivating threat of having work rejected and getting blocked as a disincentive to rushing) but it would be neat to see it officially demonstrated.

Anyone tried this or willing to try? I would be very interesting in hearing the results.

Monday, March 15, 2010

Citation Tracker: Now with an API

A few months back I announced the availability of Citation Tracker, a tool that allows monitoring of your publications for incoming citations and mentions on the web. We received plenty of suggestions and we have been trying to implement them, with the hope of moving out of the alpha version by the end of the year.

Today, although we are still in alpha, I am happy to announce that we reached a major milestone: Citation Tracker now has a public API available!

We have posted documentation online and a sample PHP client for those that want to experiment. For details you can see the online documentation, but here is some brief description of the things that you can do through the API:
  • Publications: Add, remove, get, and edit publications. Pretty self-explanatory
    • Add publications
    • Delete publication
    • Edit publication
    • Get publications
  • Monitoring: Add, remove, get, and edit monitoring channels. A monitoring channel is an citation-oriented site (such as Google Scholar, Libra, SSRN), or a general web search engine (Google, Bing, Ask, Yahoo). We monitor these sites for new results that match the publications and return back the new citations (or mentions, if we refer to general web results).
    • Add monitoring channel
    • Delete monitoring channel
    • Get active monitoring channels
    • Update monitoring channel
  • Citations: Get, update, and edit citations or web mentions, as returned by the different monitoring channels.
    • Get citations
    • Update citation state (new, accepted, discarded, review-later)
    • Update citation

Although we are still in alpha, NYU Library started using the tool and the API for creating a new service on top. I am sure that other people will have ideas of what they can do with the API. Enjoy and let me know if you have any feedback.

Tuesday, March 9, 2010

The New Demographics of Mechanical Turk

Past surveys on the demographics on Mechanical Turk users indicated that most of the workers come from the US, are younger and more educated than the general population, and work on MTurk as a way to get some spare cash.

Since the last survey, a few things have changed. First, Amazon allows now workers in India to get paid in cash in rupees, essentially encouraging many people from India to start using Mechanical Turk as workers. Second, the recession has affected many households, leaving many people at home looking for cash to cover their needs. These two forces has changed the demographics of the participants, so a new survey was needed to capture the new demographics of the Mechanical Turk workers.

So, in February 2010, I conducted a new survey on Mechanical Turk, paying the workers10 cents for participating.

The first major change was the country of origin. In the past 70%-80% of the workers were coming from the US, but now the percentage is closer to 50% and it may decrease even more. India is now a major contributor of workers, with almost 35% of the workers coming from the subcontinent. The remaining workers come from 66 different countries. The exact numbers in the survey:

  • United States: 46.80%
  • India: 34.00%
  • Miscellaneous: 19.20%

The analysis also indicated that the profile of the Indian workers is quite different from the profile of the U.S-based workers. So, below I present the results broken down by country.

Gender Breakdown

The first analysis focus on the gender breakdown. Across US-based workers, there are significantly more females than males, while the situation is reversed for Indian workers.

The main reason for the overrepresentation of females in the US-based workforce is the nature of the tasks and work on Mechanical Turk. Most participants in the US use Mechanical Turk as a supplementary source of income, and often Mechanical Turk is used by stay-at-home parents, unemployed and underemployed workers, and so on. Since females are more likely to fit into these categories, there is a corresponding increase in representation. On the contrary, more Indian workers treat Mechanical Turk as a primary (or at least significant) source of income, and we see more males working on Mechanical Turk.

Age Distribution

In terms of age distribution, there is definitely an overrepresentation of younger workers, compared to the general population of Internet users. While this holds both for the US and for India, we see an even higher skew towards younger workers among Indians.

Educational Level

We also asked the Mechanical Turk workers to declare their educational level. In general, the (self-declared) educational level of the workers is higher than the general US and Indian population. There are two factors that may contribute to this. First, many of the workers are younger than the overall population and, ceteris paribus, this leads to higher educational level. Finally, while we may not necessarily discount the possibility of false disclosure, there are no incentives that would bias workers towards lying in this survey.

Income Level

We were also interested to examine the income level of the workers on Mechanical Turk. In the US, the shape of the distribution roughly matches the income distribution in the general US population. However, it is noticeable that the income level of US workers on Mechanical Turk is shifted towards lower income levels. For example, while 45% of the US Internet population has income below $60K/yr, the corresponding percentage across US-based Mechanical Turk workers is 66.7%. (This finding is consistent with the earlier surveys that compared income levels on MTurk workers with income level of the general US population of Internet users.) The picture is drastically different across US-based and Indian workers. Workers based in India have significantly lower incomes, as expected, and more than 55% of the workers declared an income of less than $10K/year.

Marital Status, Children, and Household Size

In terms of marital status and household size, the answers tend to match the age demographic of the workers reported earlier. The majority of the workers, both in India and in the US, do not have children, and a significant fraction of them are single. An interesting contrast is the household size, which seems more to reflect cultural norms than anything specific to Mechanical Turk: While more Indian workers are single and without children, they seem to stay in houses with larger number of household members, compared to US workers: Indian workers either stay with their family, or they tend to have a comparatively larger number of roommates, compared to US workers.

Level of Engagement on Mechanical Turk

We also asked a set of questions for evaluating the level of engagement of Mechanical Turk workers on the marketplace. Since we did not detect significant deviations across countries, we will be reporting the results in aggregate form, without separating by country of origin of the worker. In general most workers spend a day or less per week working on Mechanical Turk, and tend to complete 20-100 HITs per week. Correspondingly, this generates a relatively low income stream for Mechanical Turk work, which is often less than $20 per week. Of course, there are a few workers that devote a significant amount of time and effort, completing thousands of HITs, and generating a respectable income of more than $1000/month. For these workers, Mechanical Turk tends to be the primary source of income, of course. For Indian-based workers, such salary levels are typically satisfactory for the type of work that is available on Mechanical Turk (i.e., tedious tasks that do not require significant specialized skills)

Motivations for Participating on Mechanical Turk

To understand better why people participate on Mechanical Turk, we asked for both qualitative (i.e., free text) and a set of structured questions. The main structured question that we asked was the following:

Why do you complete tasks in Mechanical Turk? Please check any of the following that applies:
  • Fruitful way to spend free time and get some cash (e.g., instead of watching TV)
  • For "primary" income purposes (e.g., gas, bills, groceries, credit cards)
  • For "secondary" income purposes, pocket change (for hobbies, gadgets, going out)
  • To kill time
  • I find the tasks to be fun
  • I am currently unemployed, or have only a part time job

The answers were quite different across Indian and US-workers. Very few Indian workers participate on MTurk for "killing time", and significantly more Indians treat MTurk as a primary source of income. (Not surprising given the average income level of an Indian worker vs the income level of the US workers.)

While these graphs are ok, I would actually encourage everyone to go through the textual responses of the workers. Below you can find the data embedded in a Google Spreadsheet. Go through the column "EngagementQ1" and I am sure that you will enjoy reading all the answers given by the workers.

More Details

The set of blog posts about the demographics of Mechanical Turk have been a little bit too popular, reaching a point where people were asking me how to cite these surveys. While I thought that pointing to the blog would be enough, I was surprised to find out that many people consider a blog post to be too "informal" to cite.

Although I find this academic conservatism kind of funny, I am not sure whether such type of work should appear in an academic conference, journal, or magazine. Maybe a magazine-style journal would be ok but I am still not 100% convinced.

Anyway, as a compromise, I now created a working paper with the results of these demographic studies, and now whomever is interested can cite the "official" working paper with the results on the demographics of Mechanical Turk. (As you will notice, it is essentially this blog post, pasted in a PDF file.) As an added value, you can also find there the Excel spreadsheet with all the results and perform your own analyses and studies.

I would like to consider this working paper as a true working paper, i.e., update it over time with the most current results, if I consider this necessary. Until then, enjoy the current results and let me know if there are other questions that you would like to see answered.