Sunday, October 21, 2012

New version of Get-Another-Label available

I am often asked what type of technique I use for evaluating the quality of the workers on Mechanical Turk (or on oDesk, or ...). Do I use gold tests? Do I use redundancy?

Well, the answer is that I use both. In fact, I use the code "Get-Another-Label" that I have developed together with my PhD students and a few other developers. The code is publicly available on Github.

We have updated the code recently, to add some useful functionality, such as the ability to pass (for evaluation purposes) the true answers for the different tasks, and get back answers about the quality of the estimates of the different algorithms. 

So, now, if you have a task where the answers are discrete (e.g., "is this comment spam or not?", or "how many people in the photo? (a) none, (b) 1-2, (c) 3-5, (d) more than 5", etc) then you can use the Get-Another-Label code, which supports the following:
  • Allows any number of discrete categories, not just binary
  • Allows the specification of arbitrary misclassification costs (e.g., "marking spam as legitimate has cost 1, marking legitimate content as spam has cost 5")
  • Allows for seamless mixing of gold labels and redundant labels for quality control
  • Estimates the quality of the workers that participate in your tasks. The metric is normalized to be between 0% for a worker that gives completely random labels, and 100% for a perfect worker.
  • Estimates the quality of the data that are returned back by the algorithm. The metric is normalized to be 0% for data that have the same quality as unlabeled data, and 100% for perfectly labeled data.
  • Allows the use of evaluation data, that are used to examine the accuracy of the quality control algorithms, both for the data and for the worker quality.
Currently, we support the vanilla majority voting, and the expectation-maximization algorithm to combine the labels assigned by the workers. We also support maximum likelihood, minimum cost, and "soft" classification schemes. In most cases, the expectation maximization together with the minimum cost classification approach tend to work best, but you can try it yourself.

An important side-effect of reporting the estimated quality of the data, is that you can then allocate further labeling resources in the data points that have the highest expected cost. Jing has done plenty of experiments and has concluded that, in the absence of any other information (e.g., who is the worker who will label the example), it is always best to focus the labeling efforts in the examples with the highest expected cost.

I expect this version of the code to be the last iteration of the GAL codebase. In our next step, we will transfer GAL into a web service environment, allowing for streaming, real-time estimation of worker and data quality, and also allowing for continuous labels, supporting quality-sensitive payment estimation, and many other tasks. Stay tuned: Project-Troia is just around the corner.

Saturday, October 20, 2012

Why oDesk has no scammers

So, in my last blog post, I described a brief outline on how to use oDesk to execute automatically a set of tasks, in a "Mechanical Turk" style (i.e., no interviews for hiring and completely computer-mediated process for posting a job, hiring, and ending a contract).

A legitimate question by appeared in the comments:
"Well, the concept is certainly interesting. But is there a compelling reason to do microstasks on oDesk? Is it because oDesk has a rating system?"
So, here is my answer: If you hire contractors on oDesk you will not run into any scammers, even without any quality control. Why is that? Is there a magic ingredient at oDesk? Short answer: Yes, there is an ingredient: Lack of anonymity!

It is a very well-known fact that if a marketplace allows anonymous participants and cheap generation of new identities, the marketplace is going to fall victim to malicious participants. There are many examples of markets that allowed anonymity and each generation of pseudonyms, that ultimately became "market for lemons". Unfortunately, when you have cheap identity generation, the reputation system of the marketplace becomes extremely easy to manipulate.

So, what is different with oDesk? oDesk has contractors that are not anonymous and their userids are tied (strongly) to a real world identity (onymous?). For example, to withdraw money from oDesk into a bank account, the name in the bank account needs to match the name that listed on oDesk. There are other mechanisms as well for verifying the identify of the contractors (e.g., when I listed myself as a contractor, I had to upload copies of my driving license, copies of my bank statements, etc), but the details of the implementation do not matter. The key element is to make it difficult or costly to create new or false identities.

A strong identify verification pretty much eliminates any type of scam. Why? Because the scammers cannot simply shut down their account after being caught scamming and create a new one. Therefore, all the oDesk contractors with 99.9% probability will not try to scam you. Now, do not get me wrong: you are going to run into incompetent contractors. But there is a difference between an incompetent contractor and one that deliberately tries to scam you.

As my colleague John Horton says: "An incompetent worker who puts some effort in the task is like a bad bus driver: Very slow to take you to your destination but at least you are going towards the correct place, albeit slowly. The scammers are like the unlicensed cab drivers that take you to a random place in order to demand arbitrary fare amounts afterwards to take you to your correct destination".

Sunday, October 14, 2012

Using oDesk for microtasks

Quite a few people keep asking me about Mechanical Turk. Truth be told, I have not used MTurk for my own work for quite some time. Instead I use oDesk to get workers for my tasks, and, increasingly, for my microtasks as well.

When I mention that people can use oDesk for micro-tasks, people get often surprised: "oDesk cannot be used through an API, it is designed for human interaction, right?" Oh well, yes and no. Yes, most jobs require some form of interviewing, but there are certainly jobs where you do not need to manually interview a worker before engaging them. In fact, with most crowdsourcing jobs having both the training and the evaluation component built in the working process, the manual interview is often not needed.

For such crowdsourcing-style jobs, you can use the oDesk API to automate the hiring of workers to work on your tasks. You can find the API at (Saying that the API page is, ahem, badly designed, is an understatement. Nevertheless, it is possible to figure out how to use it, relatively quickly, so let's move on.)

Here are the typical steps for a crowdsourcing-style contract on oDesk:
  • First, post a job: Use the "Post a Job" call from the Jobs API
  • Once the job is posted, poll the job openings to find who applied: Use the "List all the offers" call from the Offers API
  • Examine the details of the contractors that bid on the job: Use the "Get Offer" from the Offers API, to examine the details of each contractor. For example, for a task we had to have at most 10 people from a given country. So, the first 10 people from each country were hired, while subsequent applications from a country that had already 10 applicants were denied. Other people may decide not to hire contractors with less than 50 hours of prior work. It seems to be an interesting research topic to intelligently decide what aspects of the contractor matter most for a job, and hire/decline applications based on such info.
  • Make offers to the contractors: [That is the stupid part: Apparently the API does not allow the buyer to simply "accept" the bid by the contractor, although this is trivially possible through the web interface]. Use the "Post a Job" call, and create a new job opening for the contractor. Then use the "Make Offer" call from the Offers API, to generate an offer for the contractor(s) that you want to hire.
    • If you do not want to pay per hour, but rather per task, create an hourly contract, but set the maximum working hours per week at zero. Yes, this is not a mistake. You will be using the Custom Payments functionality to effectively submit "bonus payments" to the contractor.
    • Typically, it is better to have a mixture of both hourly wage and a fixed price component. You can have a no-hourly-wage policy by setting at 0 the maximum hours that can be charged, simulating MTurk. Or you can specify the hourly wage, and set the limit of how many hours can be charged per week.
  • Direct the contractor how to work: For that use the Message Center API, to send a message to the contractor, with the URL where you host your task. [Note: oDesk does not provide functionality for handling the task execution, so it is up to you to build that infrastructure. If you have ever built an "external HIT" on MTurk, you are ready to go. Just now you need to send the oDesk workers a url, where they can login to your website, and their username/password. You can go full force and allow an oDesk authentication, but this seems a little bit too much for me.]
  • Whenever the contractor has completed enough tasks, use the Custom Payment API to submit the payment. Repeat as needed.
  • When the task is done, end the contract using the contracts API.
That's all folks! In the next few weeks, I will try to post the code for some of the crowdsourcing experiments that we conducted with oDesk.