A Computer Scientist in a Business School

Thursday, August 27, 2009

Workshop on Information in Networks (WIN)

For those of you interested in the study of networked data, I would like to bring your attention to the "Workshop on Information in Networks (WIN)", a workshop organized by my colleagues Sinan Aral, Foster Provost, and Arun Sundararajan. It will take place on September 25-26, 2009. From the description:

The purpose of WIN is to bring together leading researchers studying ‘information in networks’ – its distribution, its diffusion, its value, and its influence on social and economic outcomes – in order to lay the foundation for ongoing relationships and to build a lasting multidisciplinary research community.

I should emphasize that the phrase "bring together leading researchers" is not a the standard template used in many call for papers. The lineup of speakers is truly outstanding! I would be very hard pressed to find any conference that would have such a lineup of invited speakers:

Lada Adamic, University of Michigan
Albert-Laszlo Barabasi, University of Notre Dame, Northeastern University
Ronald Burt, University of Chicago
Damon Centola, MIT
Pedro Domingos , University of Washington
Christos Faloutsos, Carnegie Mellon
James Fowler, University of California, San Diego
Sanjeev Goyal, University of Cambridge
Bernardo Huberman, HP Labs
Matthew Jackson, Stanford University
Michael Kearns, University of Pennsylvania
Jon Kleinberg, Cornell University
Rachel Kranton, Duke University
David Lazer, Harvard University
Jure Leskovec, Stanford
Michael Macy, Cornell University
Alex (Sandy) Pentland, MIT
Duncan Watts, Yahoo! Research

It is really as good as it gets. If you are interested in networked data and can be in New York on September 25-26, then this is an event that you must attend!

Tuesday, August 11, 2009

Get a Consent Form (for IRB) on MTurk using Qualification Tests

I was browsing through the various qualification tests on Mechanical Turk, checking what requesters ask and how they structure the tests. The one test that caught my eye was designed by Daniel Velleman and David Beaver from the Linguistics department of The University of Texas at Austin.

Here is the test:

"Which sentence do you prefer?" eligibility form
This qualification will allow you to participate in our English language research HIT, "Which sentence do you prefer?"
Is English your first language?
Yes
No
Do you (or did you) have at least one parent or caregiver
whose first language was English?
Yes
No

Please read this information

You are invited to participate in a survey, entitled "Which sentence do you prefer?" The study is being conducted by Daniel Velleman and David Beaver in the Linguistics department of The University of Texas at Austin.

Calhoun 501
1 University Station B5100
Austin, TX 78712-0198
(512) 471-1701

The purpose of this study is to examine English speakers' preferences about the order in which written information is presented. Your participation in the survey will contribute to a better understanding of the English language. We estimate that it will take about a minute of your time to complete each question. You are free to contact the investigator at the above address and phone number to discuss the survey.

Risks to participants are considered minimal. There will be no costs for participating. You will be paid for each HIT you complete, but will not otherwise benefit from participating. Your Amazon account information will be kept while we collect data for tracking purposes only. A limited number of research team members will have access to the data during data collection. This information will be stripped from the final dataset.

Your participation in this survey is voluntary. You may decline to answer any question and you have the right to withdraw from participation at any time without penalty. If you wish to withdraw from the study or have any questions, contact the investigator listed above.

If you have any questions, please email Daniel Velleman at ut.linguistics.mturk@gmail.com. You may also request a hard copy of the survey from the contact information above.

This study has been reviewed and approved by The University of Texas at Austin Institutional Review Board. If you have questions about your rights as a study participant, or are dissatisfied at any time with any aspect of this study, you may contact - anonymously, if you wish - the Institutional Review Board by phone at (512) 471-8871 or email at orsc@uts.cc.utexas.edu.

IRB Approval Number: 2009-03-0123
I understand want to participate in this study.

It is indeed a very clever idea to leverage a qualification test, to get workers to fill-in a consent form, and satisfy at the same time the requirement of the Institutional Review Board.

Perhaps the trick will be useful to other researchers that want to run human studies on Mechanical Turk. (I still believe that for this study an IRB is not required, but this is not the point of this post.)

Wednesday, August 5, 2009

Top Requesters on Mechanical Turk

Today I had a chat with Dahn Tamir about all things MTurk. He was particularly interested in the archive of all requesters that I have collected over the last 7 months. So, I queried the database, computed some basic statistics and sent him the results.

Then I thought: why not exporting the live results as well? A few php lines later, the leaderboard with the top Mechanical Turk requesters was born and is now available at http://mturk-tracker.com/top_requesters/

You can see for each requester the total number of projects they have posted on Mechanical Turk since January 2009, the total number of HITs, and the total value of the posted HITs. If you are also interested in whether the requester is still active, you can see when was the last time that they posted a HIT.

By clicking on their names, you can see the archive of the last 100 tasks that they have posted and by clicking at the requesterid you get to Amazon and you can see the tasks that are available now.

Enjoy!

Tuesday, August 4, 2009

When to Post Tasks on Mechanical Turk?

People that have experience with Mechanical Turk know that getting long tasks done on Mechanical Turk is tricky. While it is relatively easy to get small tasks done quickly, it is much more difficult to estimate how long a big task will take. The "estimated time" given by the Mechanical Turk interface is really crappy and provides pretty much no guidance if you expect your task to last longer than a day.

Naturally questions like this arise: When is it best to post a task? How can I minimize my waiting time?

Trying to understand better how tasks are being completed on Mechanical Turk, I started crawling Mechanical Turk every few minutes collecting data about the HITs, the requesters, how long each HIT is available and so on.

Queue

The first outcome of this effort was the Mechanical Turk Monitor, a visualization tool that shows how many projects are available at any given time, how many HITs, and the available rewards (see the old post).

Arrival process and Serving process

This tool was effectively showing the size of the "queue". However, it did not reveal neither how many tasks arrive per day on MTurk, nor how much work gets done on MTurk every day. So, last week I decided to display this information, and show the activity of the requesters and the corresponding activity of Turkers every day:

Activity of requestors: How many tasks are posted per day
Activity of Turkers: How many tasks are completed per day

Posting Activity

Now, we can start scratching the surface on how things get done on Mechanical Turk. A first pass is to see the statistics for what is being posted over time:

The x-axis depicts time, and the y-axis is the value of the HITs being posted every day. The blue line depicts the total value of the HITs being posted. One immediate observation is that there is some significant periodicity. Taking the 7-day average (red line) smooths significantly the curve. This indicates that there is some strong weekly periodicity.

Let's take a look at the distribution of posting activity over the days of the week:

The plot shows the distribution of the activity for every day of the week. By activity, we define the total value of HITs being posted on each day. As we can see, weekends tend to be significantly more quiet than weekdays. In fact, even Mondays tend to be relatively quiet, perhaps because requesters prepare their HITs that are then being posted on Tuesdays :-)

Well, the plot is not very surprising. Lots of activity during the workdays, less activity over the weekends.

Workers Activity

The interesting result though comes when we look at the activity of the Turkers:

It seems that Turkers are not in sync with the requesters. In fact, the activity on Saturdays us comparable to the activity during the weekdays. Surprisingly, Mondays tend to see significantly less activity. (Perhaps due to the small number of tasks being posted over the weekend?).

Conclusion

What is clear is that there is a relative lag between the activity of requesters and workers. Although it is hard to figure out causality from these figures, it seems that Fridays and Saturdays are good days to post tasks on Mechanical Turk. Relatively low competition for the attention of workers, and significant level of workers activity during Saturday.

So, now you know: Post your HITs on Friday and go away...