Tuesday, September 30, 2008

How good are n-gram Markov models for language modeling?

Apparently pretty good for modeling the responses of Sarah Palin during her last couple of interviews! Check them out:

Friday, September 12, 2008

After reporting the results about "why Turkers Turk," I received a set of questions about further things that people would like to know about the Turkers. One of the most common questions was about the compensation of Turkers: "How much do they make by Turking?"

Well, there is no question about Mechanical Turk that Mechanical Turk cannot answer, so here we go. I posted the very same question on MTurk, asking people about their average compensation per week. Without further ado, here are the results:

A small number of people make more than $100 per week, about 20% make more than $20 per week, and the majority get less than $20. So indeed it seems unlikely that people work on MTurk for a living.

It is much more likely that people actually enjoy what they are doing, and getting some cash is a nice side-effect. Furthermore, it is work that can be done even while working, and doing the tasks on MTurk helps other people. My own gut feeling is that the research about the motivations that drive people to contribute to open source projects can also be applied here to explain why Turkers Turk.

(Info: The current survey paid 5 cents per HIT, and received responses from 200 Turkers. I will keep running the survey to collect 1,000 responses and will report if I see any significant changes. But so far the results seem remarkably stable.)

Thursday, September 11, 2008

Why People Participate on Mechanical Turk, Now Tabulated

A few months back, I decided to ask Turkers about their motivation for participating on Mechanical Turk. I found their responses quite fascinating, so I decided to list them in their raw format, without any further tabulation and processing.

However, as time passed, I realized that I wanted to have the results in a more summarized and accessible format. Therefore, I bit the bullet and organized the results. Of course, I had no time for such a big task. So, what to do? First, I hired two coders using RentACoder.com, to read and identify the main reasons listed in the responses. The two coders agreed on 9 broad categories:

A. To Kill Time
B. Fruitful way to spend free time (Instead of watching TV, Not to waste time, Rather than playing video games/online games, Sense of purpose when watching TV, Something to do during downtime in work)
C. Income purposes (Gas, Bills, Make money, Credit card, Groceries, School, Help family)
D. Pocket change/extra cash (Hobbies, Mad money, Buy personal stuff)
E. For entertainment, for fun, interesting, addiction
F. Challenge, self-competition
G. Unemployed, no regular job, as part-time job
H. To sharpen/To keep mind sharp
I. Learn English

Then, I simply listed the responses on Mechanical Turk, and asked (new) Turkers to identify the category (or categories) for each response. Here are the percentages for each category (note that one response can be classified into multiple categories):

and the actual percentages:

A.	20.50%
B.	14.00%
C.	49.00%
D.	34.00%
E.	42.00%
F.	5.50%
G.	3.50%
H.	3.50%
I.	4.00%

So, we can see that many Turkers complete such tasks to get some extra cash and pay for gas (maybe we should wish for high oil prices :-) but there is a significant fraction that does it for fun, because they consider Turking interesting, and sometimes even addicting!

I still consider the responses themselves more interesting than the tabulated version, so go and take a look yourself!

Tuesday, September 9, 2008

Links to Mechanical Turk Articles

I came back after the summer break, and I found a long list of articles regarding Mechanical Turk. So, let's give a list of links with small commentary, to start the new blogging season:

First, David Pennock points out that Mechanical Turk can be used for dubious activities as well. Interestingly enough, in the comments, the creator of some of the suspicious HITs mentioned in the article replies and argues that his actions are legitimate marketing. You may disagree, but when an alleged "spammer" shows up to defend his actions, it is unlikely to be a true black-hat spammer.
From the comments, we get another great pointer to the Floozy Speak blog, presenting a survey of why Turkers participate and complete tasks on Mechanical Turk. Nicely organized, it confirms my earlier post on the motivating factors for Turkers to participate (I have to finally tabulate these responses and post the summary... long overdue...).
To my great disappointment, I realized that ReadWriteWeb has a sensationalistic article titled "Amazon's Mechanical Turk Used for Fraudulent Activities." In contrast to David Pennock's post, it portrays Mechanical Turk as a marketplace of spammers. Even the last "positive" paragraph is phrased in a way that reinforces the negative message. Not sure if this is what Brynn Evans was thinking about Mechanical Turk when she collected the screenshots of the spammers! My own experience indicates that Turkers have very good judgment and they avoid the tasks that look spammy (Lukas provides further evidence about that). I would put this article in the same category as all those articles in the mass media that portray the Internet as a place for kidnappers, pedophiles, scammers, and gamblers. Yes, in any system where you have human participation, you will find such people! Are we going to run away?
Bob Carpenter describes briefly his approach for modeling annotators' accuracy. His hierarchical Bayesian model builds on previous research in epidemiology, and allows us to estimate the quality characteristics of each annotator, and estimate the most likely "correct" responses for questions that get multiple answers. The model generates: (a) a set of labelers with quality characteristics (specificity and sensitivity), and (b) a dataset with a class balance. Matching the two, we get the final annotated dataset. I found the model very inspiring!
Finally, Brendan posts a description of their EMNLP'08 paper on using Mechanical Turk for completing a variety of NLP tasks. As far as I can tell, this is the first published study that examines the accuracy of Turkers in performing various annotation tasks, and provides plenty of information for people who are looking for some "wetlab statistics." I truly wish that I had a reference to add it to our KDD'08 paper, where we demonstrated the value of using multiple noisy annotations. While we had results indicating performance characteristics under different levels of noise, reviewers were curious to find out about the actual noise level on systems like Mechanical Turk. This EMNLP'08 paper provides plenty of such data!

Lots of great stuff!

A Computer Scientist in a Business School