Monday, March 14, 2011

Do Mechanical Turk workers lie about their location?

A few weeks back, Dahn Tamir graciously allowed me to take a peek at the data that he has been gathering about this workers on Mechanical Turk. He has assigned tasks over time to more than 50,000 workers on Mechanical Turk, so I consider his data to be one of the most representative samples of workers.

One of the nice tasks that he has been running is a simple HIT in which he asks workers to report their location. At the same time, in this task, Dahn was recording the IP of the worker. Why the task was nice? Because there is absolutely no incentive for the workers to be truthful. The submission will be accepted and paid no matter what. In a sense, it is a test that check if workers will be truthful in cases where it is not possible to check their accuracy.

So, we used this test to check how sincere are the workers: We can simply geocode the IP address and find out the actual location of the worker. (With some degree of error, but good enough for approximation purposes.) For the workers that reported to be based in the US (approximately 22,000 workers), the HIT was asking for the zip code of the worker, making it easy to assign an approximate long/lat location.

To measure how accurately the worker report their location, we measured the distance between the location of the IP and the location of the zip code. The plot below shows the distribution of the differences:

As you can see, most of the workers were pretty truthful about their location. The difference in distance was less than 10 miles for more than 60% of the workers: this difference can be easily explained by the limited accuracy of the geocoding API's and by the approximation of using zipcode locations.

Of course, the flip side of the coin is that a significant fraction of the workers were essentially lying about their location: For 10% of the workers (i.e., ~2250 of them) the IP address was more than 100 miles away from the reported zip code. For 2% of the workers (i.e., ~500 workers) the distance was more than 1000 miles away.

The biggest liar? A worker from Chennai, India who reported a zip code corresponding to Tampa in Florida. The IP was a cool 9500 miles away from the reported location!