A while back, I wrote about my experiences after using Amazon Mechanical Turk for conducting experiments that require input from users. I am now happy to announce :-) that the first paper that uses this methodology been accepted at IEEE ICDE 2008 and is now available online.
The paper, co-authored with Wisam Dakka, discusses a simple empirical technique for automatically extracting from a text database a set of facet hierarchies that are useful for browsing the contents of the database. We used Mechanical Turk in this context to evaluate the "precision" and "recall" of the generated hierarchies. This experiment would have been almost impossible to conduct without using Mechanical Turk, as it would require multiple users reading and annotating thousands of news articles. Using Mechanical Turk, the experiments were done in less than three days.
In the final experiment of the paper, though, where we needed to interview and time the users while they were using the faceted hierarchies to complete various tasks, we resorted to the traditional, lab-based setting. However, during the summer, we only managed to recruit five users that expressed interest to participate. We observed them in the lab while performing their tasks and recorded their reactions and impressions. (Fortunately, the results were statistically significant.)
Next time, we will attempt to use Mechanical Turk for such "interview+timing" experiments as well. However, I will need to talk more with people that perform often such experiments to see how they would react to such approaches, where the human subjects are completely disconnected from the researcher. Even though simple timing experiments can be easily performed using MTurk, I am a little uncomfortable about the reliability of such experiments.