Links to Mechanical Turk Articles

Tuesday, September 9, 2008

Links to Mechanical Turk Articles

I came back after the summer break, and I found a long list of articles regarding Mechanical Turk. So, let's give a list of links with small commentary, to start the new blogging season:

First, David Pennock points out that Mechanical Turk can be used for dubious activities as well. Interestingly enough, in the comments, the creator of some of the suspicious HITs mentioned in the article replies and argues that his actions are legitimate marketing. You may disagree, but when an alleged "spammer" shows up to defend his actions, it is unlikely to be a true black-hat spammer.
From the comments, we get another great pointer to the Floozy Speak blog, presenting a survey of why Turkers participate and complete tasks on Mechanical Turk. Nicely organized, it confirms my earlier post on the motivating factors for Turkers to participate (I have to finally tabulate these responses and post the summary... long overdue...).
To my great disappointment, I realized that ReadWriteWeb has a sensationalistic article titled "Amazon's Mechanical Turk Used for Fraudulent Activities." In contrast to David Pennock's post, it portrays Mechanical Turk as a marketplace of spammers. Even the last "positive" paragraph is phrased in a way that reinforces the negative message. Not sure if this is what Brynn Evans was thinking about Mechanical Turk when she collected the screenshots of the spammers! My own experience indicates that Turkers have very good judgment and they avoid the tasks that look spammy (Lukas provides further evidence about that). I would put this article in the same category as all those articles in the mass media that portray the Internet as a place for kidnappers, pedophiles, scammers, and gamblers. Yes, in any system where you have human participation, you will find such people! Are we going to run away?
Bob Carpenter describes briefly his approach for modeling annotators' accuracy. His hierarchical Bayesian model builds on previous research in epidemiology, and allows us to estimate the quality characteristics of each annotator, and estimate the most likely "correct" responses for questions that get multiple answers. The model generates: (a) a set of labelers with quality characteristics (specificity and sensitivity), and (b) a dataset with a class balance. Matching the two, we get the final annotated dataset. I found the model very inspiring!
Finally, Brendan posts a description of their EMNLP'08 paper on using Mechanical Turk for completing a variety of NLP tasks. As far as I can tell, this is the first published study that examines the accuracy of Turkers in performing various annotation tasks, and provides plenty of information for people who are looking for some "wetlab statistics." I truly wish that I had a reference to add it to our KDD'08 paper, where we demonstrated the value of using multiple noisy annotations. While we had results indicating performance characteristics under different levels of noise, reviewers were curious to find out about the actual noise level on systems like Mechanical Turk. This EMNLP'08 paper provides plenty of such data!

Lots of great stuff!