Friday, October 15, 2010

Mechanical Turk and Data Driven Journalism: The Case of ProPublica

Last year, in a Mechanical Turk Meetup in New York, I met with Amanda Michel of Propublica, a "non-profit newsroom that produces investigative journalism in the public interest".

ProPublica had a set of very interesting ideas on how to use crowdsourcing, to improve their practices and increase their reporting reach. Amanda had some great ideas on how to use crowdsourcing, starting with operational aspects of data-driven journalism, up to more ambitious goals. What was common, in all efforts, was a simple goal: Find, reveal, and fight corruption. When you meet with such people, it is hard not to be inspired. So, over last year I kept interacting with ProPublica on how to use Mechanical Turk for their goals.

Take a simple example. ProPublica was facing a significant data integration problem. For one of their projects, they wanted to extract data from hundreds of different city, country, and state databases. Needless to say, building an integration system of such scale is difficult and beyond the reach of many advanced IT companies. Definitely not a problem that a journalism organization could solve for the purpose of writing a single story. How Mechanical Turk could help? The Turkers could be the ones interacting with the databases, creating an effective, human-powered hidden-web crawler, that was up and running in a couple of days.

Mechanical Turk became quickly an integral part of ProPublica's newsroom operations. It became so valuable, that ProPublica today published an article describing how they are using Amazon’s Mechanical Turk to do data-driven reporting and they made public the ProPublica's Guide to Mechanical Turk. It goes step by step through all the challenges that a newcomer on Mechanical Turk may face, and shows how to best approach the tool. Needless to say, these links are being passed around on Twitter like crazy.

ProPublica is a great case study, not because they did something artistic or fancy, but because they kept their focus razor-sharp in integrating crowdsourcing to their operations. The 10,000 sheep will be passed around virally and inspire ideas, but mainstream adoption will come after reading success stories like the one from ProPublica. At the end of the day, people want to know how to get things done.

I kept the best part for the end. From this article:

ProPublica has received a Special Distinction Award from the Knight-Batten Awards for Innovations in Journalism. ProPublica's Distributed Reporting Project was honored for "systematizing the process of crowdsourcing, conducting experiments, polishing their process and tasking citizens with serious assignments." The judges called it "a major step forward with how we understand crowdsourcing."