Monday, March 26, 2012

Mechanical Turk: More SETI@Home and less Amazon Web Services

A few days back, I wrote about the requirements that labor markets need to satisfy in order to claim that they offer scalable "cloud labor" services. As a reminder, the characteristics that define cloud services are:
  • on-demand self-service
  • broad access through APIs
  • resource pooling
  • rapid elasticity
  • measured service
I used Amazon Mechanical Turk for a first test of these condition, and the results were:
  • On-demand self-service: Yes. We can access the labor pool whenever it is needed.
  • Broad access through APIs: Yes. Computers can handle the overall process of hiring, task handling, etc.
  • Resource pooling: Yes and No. While there is a pool of workers available, there is no assignment done from the service provider. This implies that there may be nobody willing to work on the posted task and this cannot be inferred before testing the system. It is really up to the workers to decide whether they will serve a particular labor request.
  • Rapid elasticity: Yes and No. The scaling out capability (increasing rapidly the labor pool) is rather limited. We simply cannot suddenly hire hundreds of workers to work in parallel in a task, for a sustained period of time (workers that do 1-2 task and then leave cannot be counted for the purpose of elasticity). As in the case of resource pooling, it is up to the workers to decide whether to work on a task, and it is highly unclear what level of pricing could achieve what level of elasticity.
  • Measured Service: No. Quality and productivity measurement is done by the employer side, and there is no SLA with the client that is paying for the provided services, which could guarantee a minimum level of performance.

So, why MTurk fails these tests?

The root cause of failure is the voluntarily, market-based mechanism for allocating labor to tasks. (Yes, markets are not necessarily efficient, especially when they are not designed properly.)

The fact that MTurk cannot "forcibly" assign a task to a worker, makes it almost impossible to ever satisfy the requirements for these conditions. If someone wants to solicit someone a large number of workers (rapid elasticity), it is not clear that the market will have enough participants to satisfy the needs. Even if they are, we do not know the wage that the available workers will require. If, however, there was a guaranteed pool available, with known prices, then MTurk could say what are the limits of elasticity, and how much it would cost. Similar for pooling.

In a sense, today's Mechanical Turk is more similar to the SETI@Home in 1999, rather than to EC2 and S3 from Amazon in 2009. Here are the similarities:
  • Distributed, voluntarily participating infrastructure
    • With Amazon Web Services (AWS) such as EC2, S3, etc. there is a single provider of hardward infrastructure, who plans for availability, does capacity planning by upgrading the infrastructure when needed, etc.
    • In SETI@Home, the computation was coming from volunteers that were joining the network at their own will, and could potentially donate time to other projects beyond SETI (e.g., protein folding and others). There was no single provider of hardware capabilities, as in the Amazon case, but rather a distributed, completely heterogeneous infrastructure.
    • On Mechanical Turk(and crowdsourcing in general), every person comes and leaves at will. There is no single agency that hires all the workers and plans for availability, does capacity planning, etc.
  • Diversity of underlying infrastructure
    • With EC2 and S3, we have an SLA guarantee for the services we are buying. If we buy 3 m1.medium machines, Amazon provides the memory, cpu speed, and other characteristics of these machines.
    • In SETI@Home, the computation was split into multiple pieces and distributed to a large number of computers, each with different capabilities. Through testing SETI was building profiles of the different machines to potentially allocate data units more efficiently.
    • On Mechanical Turk, we observe the same setting today but with human tasks. We have no idea what are the skills of the underlying "human units", unless we probe and test beforehand.
  • No guarantee of "uptime" (task completion)
    • With EC2 and S3, we have a reasonable guarantee of uptime: When a service receives a request, we expect that the answer will come back, with probability following the SLA guarantees (which is very high). Very rarely we need to plan for cases where the system is unavailable; such planning is not seen as a common everyday need.
    • In SETI@Home, there was no guarantee that an data unit was ever going to be returned by the client. The client may decide to uninstall the application, switch off the computer, or do any action that could interrupt the computation process. SETI was keeping track of the reliability of the machines and how often they returned their data units back, within a reasonable amount of time.
    • On Mechanical Tuk, we also need to handle the fact that a task may not be completed after the assignment, may be returned and need to be reposted etc. MTurk keeps track of such failures and keeps statistics about the tasks that were returned and abandoned by each worker.
  • Malicious clients
    • With EC2 and S3, we have almost a guarantee that the CPU will not misrepresent its capabilities and will always return correct results. Similarly for storage we have a 99.99999% guarantee that the data will not be lost. We may maintain multiple servers for a service, mainly as an attempt to increase reliability and have load balacing, but we start with the understanding that even the first machine will operate in a “best effort” basis and will not behave maliciously.
    • In SETI@Home, there were many attempts from people to game the system and return back non-properly processed data, just to increase their statistics and place in the standings. To avoid malicious clients, SETI was performing the computation multiple times, effectively wasting the available computing capacity for reliability purposes.
    • We observe the same thing with Mechanical Turk. Instead of trusting each individual to do an honest effort, we need to resort to redundancy, gold tests, and so on, effectively wasting capacity. The introduction of "trusted" workers (Mechanical Turk masters) reduces the problem but the fundamental problem is still there.

So, what is the future? 

The naive solution is to have a "traditional" outsourcing service, sending tasks to a classic BPO company such as Tata Consulting, and rely on their reliability and availability guarantees. (Interestingly enough, many of these BPO's use crowdsourcing-like approaches to manage internally their tens of thousands of employees that handle basic tasks.)While I see the appeal, I do not find the solution satisfactory.

Personally, I see a supply side market to emerge in which workers can advertise what they offer and clients can place requests against these services. (Fiverr is currently offering such a "supply-side" service, which mirrors the "demand-side" service offered by Mechanical Turk.) The service that will successfully merge the two sides and connect efficiently supply and demand will be the winner...