Tuesday, June 18, 2013

Project Troia: Quality Assurance in Crowdsourcing

One of the key problems in crowdsourcing is the issue of quality control. Over the last few years, a large number of methods have been proposed for estimating the quality of workers and the quality of the generated data. A few years back, we have released the Get Another Label toolkit, which allowed people to run their data through a command-line interface, and get back estimates of the worker quality, estimates of how well the data have been labeled, and identify the data points that have high uncertainty and therefore may require additional attention.

The next step for the Get Another Label was to get it ready to work in more practical settings. The GAL toolkit, assumed that we have all the labels assigned by the workers, we process them, and get the results. In reality, though, most tasks run in an incremental mode. The task is running over time, new data arrive, new workers arrive, and the "load-analyze-output" process was not a good fit. We wanted to have something that gives back estimates of worker quality on the fly, and again on-the-fly identifies the data points that need most attention.

Towards this goal, over the last few months we have been porting the GAL code into a web service, called Project Troia. You can load the data as the crowdsourced project runs and get back the results immediately. This allows for very fast estimation of worker quality, and also allows the quick identification of data points that either meet the target quality, or require additional labeling effort.
  • Supports labeling with any number of discrete categories, not just binary.
  • Supports labeling with continuous variables.
  • Allows the specification of arbitrary misclassification costs (e.g., "marking spam as legitimate has cost 1, marking legitimate content as spam has cost 5").
  • Allows for seamless mixing of gold labels and redundant labels for quality control.
  • Estimates the quality of the workers that participate in the task and returns the estimates on-the-fly.
  • Estimates the quality of the data that are returned back by the algorithm and  returns the estimate of labeling accuracy on-the-fly.
  • Estimates a quality-sensitive payment for every worker, based on the quality of the work done so far.
If you are interested in the description of the methods implemented in the toolkit, please take a look at the paper "Quality-based Pricing for Crowdsourced Workers". Our experiments indicate that when labeling allocation happens following the suggestions of Project Troia, we achieve the target data quality with almost optimal budget, and workers are fairly compensated for their effort. (For details, see the paper :-)

Special thanks to Tagasauris, oDesk, and Google for providing support for developing the software. Needless to say, the API is free to use, and the source code is available on Github. We hope that you will find it useful.
 

Tuesday, April 2, 2013

Intrade Archive: Data for Posterity

A few years back, I have done some work on prediction markets. For this line of research, we have been collecting data from Intrade, to perform our experimental analysis. Some of the data is available through the Intrade Archive, a web app that I wrote in order to familiarize myself with the Google App Engine.

In the last few weeks, through, after the effective shutdown of Intrade, I started receiving requests on getting access to the data stored in the Intrade Archive. So, after popular demand, I gathered all the data from the Intrade Archive, and also all the past data that I had about all the Intrade contracts going back to 2003, and I put them all on GitHub for everyone to access and download. The Excel file contains a description of the contracts, while the zip file contains information about all the individual trades and the daily opening and closing prices.

On purpose, I exclude all the Financial contracts, as the trading of these events have limited research interest. (Plus, they were too many of them.) The information from "official" stock and options exchanges has much higher volume and is a better source of information than the comparatively illiquid contracts on Intrade.

The link to the GitHub repository is also now available from the home page of the Intrade Archive. I hope that the resource hungry crawlers can now be put to sleep, not to ever come back again :-)

Enjoy!

Monday, February 25, 2013

WikiSynonyms: Find synonyms using Wikipedia redirects

Many many years back, I worked with Wisam Dakka on a paper to create faceted interfaced for text collections. One of the requirements for that project was to discover synonyms for named entities. While we explored a variety of directions, the one that I liked most was Wisam's idea to use the Wikipedia redirects to discover terms that are mostly synonymous.

Did you know, for example, that ISO/IEC 14882:2003 and X3J16 are synonyms of C++? Yes, me neither. However, Wikipedia reveals that through its redirect structure.

The Wikisynonyms web service

What we mean by redirects? Well, if you try to visit the Wikipedia page for President Obama, you will be redirected to the canonical page Barack Obama. Effectively "President Obama" is deemed by Wikipedians to be a close synonym of "Barack Obama", and therefore the redirect. Similarly, the term "Obama" is also a redirect, etc. (You can check the full list of redirects here.)

While I was visiting oDesk, I felt that this service can be useful for a variety of purposes so, following the oDesk model, we hired a contractor to implement this synonym extraction as a web API and service. If you want to try it out please go to:


The API is very simple. Just issue a GET request like this:

curl 'http://wikisynonyms.ipeirotis.com/api/{TERM}

For example, to find synonyms for Hillary Clinton:


and for Obama



Mashape integration

Since we may change the URL of the service, I would recommend registering and using Mashape to access the WikiSynonyms service through Mashape instead:

curl 'https://wikisynonyms.p.mashape.com/{TERM}' --header 'X-Mashape-Authorization: your_mashape_key'

You can easily download Wikipedia

Interestingly enough, this synonym extraction technique remains little-known, despite the easiness of extracting these synonyms. And whenever I mention Wikipedia, most people are worried that they will need to scrape the HTML from Wikipedia, and nobody likes this monkey business. 

Strangely, most people are unaware that you can download Wikipedia in a relational form and put it directly in a database. In fact, you can download only the parts that you need. Here are the basic links:
This redirect structure (as opposed, say to the normal link structure and the related anchor text) is highly precise. By eyeballing the results, I would guess that precision is around 97% to 99%.

Application: Extracting synonyms of oDesk skills

One application that we used the service was to extract synonyms for the set of skills that are used to annotate the jobs posted on oDesk. For example, you can find the synonyms for C++:



Or you can find the synonyms for Python:



Oops, as you see the term Python is actually ambiguous, and Wikipedia has a disambiguation page with the different 'senses' of the term. Since we are not doing any automatic disambiguation, we return a 300 HTTP response and ask the user to select one of the applicable terms. So, if we query now with the term 'Python (programming language)' we get:


Open source and waiting for feedback

The source code together with the installation instructions for the service is available on GitHub. Feel free to point any problems or suggestions for improvement. And thank oDesk Research for all the support in creating the service and making it open source for everyone to use.

Monday, January 28, 2013

Towards a Market for Intelligence

Last September, I was visiting CMU and a student asked me a question: "Do you know any crowdsourcing market, where we can assign tasks to people, as opposed to waiting for the workers to pick the tasks they want to work on?"

Most crowdsourcing services do not satisfy this requirement. Mechanical Turk, oDesk, eLance, and all others typically expect the workers to express interest to a task. At most, you may be able to invite workers to participate in a task, but you cannot really assign a task to a worker. 

A notable exception is Fiverr, which plays the supply side of the market; however, Fiverr has a different limitation: The tasks that can be performed are posted by the workers, and the employers pick from a set of existing tasks. 

After thinking for a while, I realized that there more such markets in which you can assign tasks to "workers". The main difference is that the "workers" are not necessarily humans, but APIs. Enter the world of Mashape, an API marketplace. Are you looking for someone to classify tweets according to their sentiment? Query the Mashape marketplace for sentiment analysis API's and then assign the task to the intelligence units of your choice.

With the advent of the API marketplaces, we see the emergence of marketplaces for "intelligence units". All these intelligence units (API's, or human workers) have different levels of quality, various levels of pricing, and even various levels of capacity and responsiveness.

With the proper abstraction, a task management platform, can use and optimize these distributed intelligence units without having to worry about the "implementation details" of this intelligence.

Wednesday, December 5, 2012

Mechanical Turk changing the defaults: The game has changed

Back in the summer of 2011, Mechanical Turk introduced a new type of qualification, the Mechanical Turk "Masters". The Master qualification was assigned by Amazon to workers that have proven themselves in the marketplace.

What exactly makes someone "proven"? This is, understandably, a well-kept secret by Amazon. The opacity of the qualification process annoys many workers: It is hard to prove that you are a Master and qualify for it, when you do not know how this qualification is granted. The rumor says that Amazon deploys decoy tasks on Mechanical Turk just to examine the performance of the workers and decide which ones to qualify as Masters. If this is correct, then it also explains why Amazon is rather secretive about the exact requirements: Workers would try to ace these test tasks, and let their guards down in others.

The existence of Masters was an good development towards creating a true reputation scheme for Mechanical Turk.  However, an action taken by Amazon a month back has changed the dynamic of the market: Now the default requirement, for all tasks created through the UI interface, is to require using Masters workers. Removing the requirement is done only through the "advanced" menu, and is followed by a warning that you may not get good results if you opt not to use Masters.

Tiny change? No. This is huge. Here are a few of the immediate, positive effects:
  • People that use the Web UI are typically the newcomers, that do not know (or want) to implement sophisticated quality control schemes. They just want to execute some simple tasks. The task templates help a lot to create a usable interface, and the Masters requirement ensures that they are not going to get back crappy results. A happy customer, is a long term customer.
  • Masters will not touch badly designed and ambiguous tasks. This enforces discipline from the requester side, to get things designed properly. Otherwise the tasks are left untouched, which is a good signal that something is wrong with the task.
  • Masters will not touch offensively priced tasks, paying less than minimum wage, while demanding high-quality work. This (hurray!) removes the impression that Mechanical Turk is about dirty cheap work and emphasizes what crowdsourcing is about: Dynamic allocation of labor on tasks, without the overhead of hiring, negotiations, etc.
There are of course, a few downsides:
  • There are much fewer Masters workers. A current search reveals 20,744 workers. This is at least an order of magnitude lower than the number of active workers that Amazon used to advertise. Of course, these Masters are much more active than the average worker, but still there are not enough of them for all the tasks that require them.
  • There is now a significant lag in the task being picked by workers. Masters are much more careful about the requesters they work with, and a new requester will need to prove that is not rejecting work unfairly, and that they pay on time. Until then, the task will get only a few workers willing to test it.
  • The tasks now take much longer to complete. My current sense is that there is a 10x slowdown, (but the improvement in quality is definitely worth it).
  • There is an increased cost. Masters require decent wages (so no more 5 cents for 5-minutes of work), and there is an increased overhead from Amazon (30% overhead for Masters vs 10% for regular workers). My take? You get what you pay for.
  • It is not clear in what tasks the Masters are tested and how a new worker can become a master. It would be great if Amazon also gets quality signals from a few reliable big requesters, but I can see many practical problems in implementing such a solution.
Overall though, this change in the defaults is showing that Amazon started acting on the criticism. It is clear that this is a risky move, as there will be a lot of work posted on Mechanical Turk will not get done due to lack of interest for poorly paying or badly designed tasks. 

But on the other hand, it shows that Amazon is looking for the long term: Let newcomer requesters get guaranteed results, and if they want to get things done faster they can focus on pricing and better task design. If they want to get further and engage other Turkers, such requesters will be aware of the risks and benefits of such a move.

So, effectively now we have the "novice" requesters, who get protected by default through the Masters qualification, and the "advanced" requesters that can implement their own qualification schemes to replace the Masters qualification. This default level of protection makes the life of wannabe-scammer workers very difficult: no obvious victims to attack. Just hunting down for a victim requester will become so difficult that it makes sense to just give up scamming and either convert into doing real work, or abandon the market.

A tiny change in the defaults with short-term problems and many big, long-term benefits. Personally, I find this move exhilarating.