Tuesday, April 2, 2013

Intrade Archive: Data for Posterity

A few years back, I have done some work on prediction markets. For this line of research, we have been collecting data from Intrade, to perform our experimental analysis. Some of the data is available through the Intrade Archive, a web app that I wrote in order to familiarize myself with the Google App Engine.

In the last few weeks, through, after the effective shutdown of Intrade, I started receiving requests on getting access to the data stored in the Intrade Archive. So, after popular demand, I gathered all the data from the Intrade Archive, and also all the past data that I had about all the Intrade contracts going back to 2003, and I put them all on GitHub for everyone to access and download. The Excel file contains a description of the contracts, while the zip file contains information about all the individual trades and the daily opening and closing prices.

On purpose, I exclude all the Financial contracts, as the trading of these events have limited research interest. (Plus, they were too many of them.) The information from "official" stock and options exchanges has much higher volume and is a better source of information than the comparatively illiquid contracts on Intrade.

The link to the GitHub repository is also now available from the home page of the Intrade Archive. I hope that the resource hungry crawlers can now be put to sleep, not to ever come back again :-)

Enjoy!

Monday, February 25, 2013

WikiSynonyms: Find synonyms using Wikipedia redirects

Many many years back, I worked with Wisam Dakka on a paper to create faceted interfaced for text collections. One of the requirements for that project was to discover synonyms for named entities. While we explored a variety of directions, the one that I liked most was Wisam's idea to use the Wikipedia redirects to discover terms that are mostly synonymous.

Did you know, for example, that ISO/IEC 14882:2003 and X3J16 are synonyms of C++? Yes, me neither. However, Wikipedia reveals that through its redirect structure.

The Wikisynonyms web service

What we mean by redirects? Well, if you try to visit the Wikipedia page for President Obama, you will be redirected to the canonical page Barack Obama. Effectively "President Obama" is deemed by Wikipedians to be a close synonym of "Barack Obama", and therefore the redirect. Similarly, the term "Obama" is also a redirect, etc. (You can check the full list of redirects here.)

While I was visiting oDesk, I felt that this service can be useful for a variety of purposes so, following the oDesk model, we hired a contractor to implement this synonym extraction as a web API and service. If you want to try it out please go to:


The API is very simple. Just issue a GET request like this:

curl 'http://wikisynonyms.ipeirotis.com/api/{TERM}

For example, to find synonyms for Hillary Clinton:


and for Obama



Mashape integration

Since we may change the URL of the service, I would recommend registering and using Mashape to access the WikiSynonyms service through Mashape instead:

curl 'https://wikisynonyms.p.mashape.com/{TERM}' --header 'X-Mashape-Authorization: your_mashape_key'

You can easily download Wikipedia

Interestingly enough, this synonym extraction technique remains little-known, despite the easiness of extracting these synonyms. And whenever I mention Wikipedia, most people are worried that they will need to scrape the HTML from Wikipedia, and nobody likes this monkey business. 

Strangely, most people are unaware that you can download Wikipedia in a relational form and put it directly in a database. In fact, you can download only the parts that you need. Here are the basic links:
This redirect structure (as opposed, say to the normal link structure and the related anchor text) is highly precise. By eyeballing the results, I would guess that precision is around 97% to 99%.

Application: Extracting synonyms of oDesk skills

One application that we used the service was to extract synonyms for the set of skills that are used to annotate the jobs posted on oDesk. For example, you can find the synonyms for C++:



Or you can find the synonyms for Python:



Oops, as you see the term Python is actually ambiguous, and Wikipedia has a disambiguation page with the different 'senses' of the term. Since we are not doing any automatic disambiguation, we return a 300 HTTP response and ask the user to select one of the applicable terms. So, if we query now with the term 'Python (programming language)' we get:


Open source and waiting for feedback

The source code together with the installation instructions for the service is available on GitHub. Feel free to point any problems or suggestions for improvement. And thank oDesk Research for all the support in creating the service and making it open source for everyone to use.

Monday, January 28, 2013

Towards a Market for Intelligence

Last September, I was visiting CMU and a student asked me a question: "Do you know any crowdsourcing market, where we can assign tasks to people, as opposed to waiting for the workers to pick the tasks they want to work on?"

Most crowdsourcing services do not satisfy this requirement. Mechanical Turk, oDesk, eLance, and all others typically expect the workers to express interest to a task. At most, you may be able to invite workers to participate in a task, but you cannot really assign a task to a worker. 

A notable exception is Fiverr, which plays the supply side of the market; however, Fiverr has a different limitation: The tasks that can be performed are posted by the workers, and the employers pick from a set of existing tasks. 

After thinking for a while, I realized that there more such markets in which you can assign tasks to "workers". The main difference is that the "workers" are not necessarily humans, but APIs. Enter the world of Mashape, an API marketplace. Are you looking for someone to classify tweets according to their sentiment? Query the Mashape marketplace for sentiment analysis API's and then assign the task to the intelligence units of your choice.

With the advent of the API marketplaces, we see the emergence of marketplaces for "intelligence units". All these intelligence units (API's, or human workers) have different levels of quality, various levels of pricing, and even various levels of capacity and responsiveness.

With the proper abstraction, a task management platform, can use and optimize these distributed intelligence units without having to worry about the "implementation details" of this intelligence.

Wednesday, December 5, 2012

Mechanical Turk changing the defaults: The game has changed

Back in the summer of 2011, Mechanical Turk introduced a new type of qualification, the Mechanical Turk "Masters". The Master qualification was assigned by Amazon to workers that have proven themselves in the marketplace.

What exactly makes someone "proven"? This is, understandably, a well-kept secret by Amazon. The opacity of the qualification process annoys many workers: It is hard to prove that you are a Master and qualify for it, when you do not know how this qualification is granted. The rumor says that Amazon deploys decoy tasks on Mechanical Turk just to examine the performance of the workers and decide which ones to qualify as Masters. If this is correct, then it also explains why Amazon is rather secretive about the exact requirements: Workers would try to ace these test tasks, and let their guards down in others.

The existence of Masters was an good development towards creating a true reputation scheme for Mechanical Turk.  However, an action taken by Amazon a month back has changed the dynamic of the market: Now the default requirement, for all tasks created through the UI interface, is to require using Masters workers. Removing the requirement is done only through the "advanced" menu, and is followed by a warning that you may not get good results if you opt not to use Masters.

Tiny change? No. This is huge. Here are a few of the immediate, positive effects:
  • People that use the Web UI are typically the newcomers, that do not know (or want) to implement sophisticated quality control schemes. They just want to execute some simple tasks. The task templates help a lot to create a usable interface, and the Masters requirement ensures that they are not going to get back crappy results. A happy customer, is a long term customer.
  • Masters will not touch badly designed and ambiguous tasks. This enforces discipline from the requester side, to get things designed properly. Otherwise the tasks are left untouched, which is a good signal that something is wrong with the task.
  • Masters will not touch offensively priced tasks, paying less than minimum wage, while demanding high-quality work. This (hurray!) removes the impression that Mechanical Turk is about dirty cheap work and emphasizes what crowdsourcing is about: Dynamic allocation of labor on tasks, without the overhead of hiring, negotiations, etc.
There are of course, a few downsides:
  • There are much fewer Masters workers. A current search reveals 20,744 workers. This is at least an order of magnitude lower than the number of active workers that Amazon used to advertise. Of course, these Masters are much more active than the average worker, but still there are not enough of them for all the tasks that require them.
  • There is now a significant lag in the task being picked by workers. Masters are much more careful about the requesters they work with, and a new requester will need to prove that is not rejecting work unfairly, and that they pay on time. Until then, the task will get only a few workers willing to test it.
  • The tasks now take much longer to complete. My current sense is that there is a 10x slowdown, (but the improvement in quality is definitely worth it).
  • There is an increased cost. Masters require decent wages (so no more 5 cents for 5-minutes of work), and there is an increased overhead from Amazon (30% overhead for Masters vs 10% for regular workers). My take? You get what you pay for.
  • It is not clear in what tasks the Masters are tested and how a new worker can become a master. It would be great if Amazon also gets quality signals from a few reliable big requesters, but I can see many practical problems in implementing such a solution.
Overall though, this change in the defaults is showing that Amazon started acting on the criticism. It is clear that this is a risky move, as there will be a lot of work posted on Mechanical Turk will not get done due to lack of interest for poorly paying or badly designed tasks. 

But on the other hand, it shows that Amazon is looking for the long term: Let newcomer requesters get guaranteed results, and if they want to get things done faster they can focus on pricing and better task design. If they want to get further and engage other Turkers, such requesters will be aware of the risks and benefits of such a move.

So, effectively now we have the "novice" requesters, who get protected by default through the Masters qualification, and the "advanced" requesters that can implement their own qualification schemes to replace the Masters qualification. This default level of protection makes the life of wannabe-scammer workers very difficult: no obvious victims to attack. Just hunting down for a victim requester will become so difficult that it makes sense to just give up scamming and either convert into doing real work, or abandon the market.

A tiny change in the defaults with short-term problems and many big, long-term benefits. Personally, I find this move exhilarating.

Sunday, November 18, 2012

How big is Mechanical Turk?

A question that people ask me very often is about the size of Mechanical Turk. How many tasks are being completed on the marketplace every day? What is the transaction volume? Let me give a quick answer: I have no idea. Since Amazon does not release any statistics about the marketplace, it is pretty much impossible to know for sure.

Mechanical Turk Tracker

However, I do have some estimates, mainly by using the data that I have been collecting through the Amazon Mechanical Turk Tracker. For those not familiar with the site, over the last four years, we are crawling the Mechanical Turk site every few minutes and we capture the complete state of the market: What tasks are available, their prices, the number of HITs available, etc.

One feature that we revamped lately is the ability to see the number of tasks that are posted and completed every day. You can check the "Arrivals" tab to see the details.



Estimating HITs posted and completed

How do we estimate the number of tasks that get posted and completed? The estimation is a little bit tricky and not 100% foolproof but it works reasonably well, based on my current observations.

Since we can keep track of the history of a task over time, we can see the changes in the number of available HITs over time. For example, we may observe a task that has the following number of HITs in sequential crawls, over time:
1000...700...500...2000...1000...100...[disappeared]

For this task, we estimate that we have an initial posting of 1000 HITs. Then, we see 1000-700 = 300 HITs completed between the first and second crawl. Then, 700-500=200 HITs completed between the second and third crawls. However, between the third and fourth crawl we see a "refill" with 2000-500=1500 HITs, which have been posted. Then we see 2000-1000 = 1000 HITs being completed, then 1000-100=900 HITs completed, and finally the task disappears and the last 100 HITs are assumed to be completed. This generates a total of 1000+1500 HITs posted, and 300+200+1000+900+100 HITs completed.

We do have some extra sanity tests but let's consider the current description as sufficient. For the record, I have checked with a few big requesters and my estimated numbers were pretty close to the actual ones, so I feel reasonably confident that I am not off completely.

Analyzing daily volumes

Now, by looking at the current arrivals data, we can see that my tracker estimates approximate \$30K-\$40K of tasks completed per day. Given that I cannot observe redundancy, and that I may miss HITs that are getting posted and completed between my crawls, I may be underestimating. However, I may also be wrong by considering as "completed" tasks that were simply taken down, without being done. To be on the safe side, I will put my under-reporting factor somewhere between 1 to 10. In other words, I estimate the real daily volume to be somewhere between \$30K to \$400K. Yes, there is a huge difference between the two, but we get the order of magnitude, and you can be as pessimistic or as optimistic as you want.

These numbers generate a yearly transaction volume for Mechanical Turk between \$10M and \$150M. Given that Mechanical Turk takes 10% to 20% as fees, this is a revenue for Amazon between \$1M (low estimate) to $30M (high estimate) per year.

What would be the value of Mechanical Turk as a startup?

I love that question. Not because it is sensible. But because I get to be completely tongue-in-cheek, and make fun of the absolutely ridiculous P/E ration for the Amazon stock: Currently the trailing P/E for Amazon is a wonderful 2,681 (yep, not a typo). Assuming that the Mechanical Turk division generates some earnings in the \$1M to \$5M range, the valuation of Mechanical Turk is somewhere between \$2 billion to \$10 billion dollars! Not shabby for a 7-year old startup :-p.

OK, getting more serious: The price-to-sales ratio for Amazon is somewhere in the 1.75 range. Therefore, given an estimated yearly transaction volume for Mechanical Turk between \$10M and \$150M, the estimated valuation for Amazon Mechanical Turk is somewhere between \$15M (pathetic) to \$250M (respectable).

What is the growth?

While I am less certain about the numbers that have to do with the absolute transaction volume, I am much more confident about the growth numbers. Since my methodology remained the same over time, the growth of the sample should match reasonably well the growth of the overall market.

If you go again to the Arrivals tab on Mechanical Turk Tracker, and change the date range to go back to 2009, you will be able to see how the arrivals and completions have changed over time.


Forget about the absolute numbers. What is very clear is the last few years were very good for Mechanical Turk. While the numbers were pretty low early on, there was a 3x to 6x YoY growth in terms of transaction volume. This was really healthy.

One thing that puzzles me is what happened around March 2012. My tracker seems to detect a sudden stop in the growth. I am not quite sure what is going on there. Is there something about my crawler? Did something change on the Mechanical Turk site that caused a lower rate of completed jobs? I noticed for example, that now Amazon puts the "Masters" qualification as a default option for all the HITs posted through the web interface. This can definitely decrease the rate of completing jobs but I am sure that it will also increase the overall level of satisfaction of the requesters with the answers submitted by the Turkers. Anyhoo, I have not enough information, so I do not want to try to overanalyze that part.

Conclusion

Mechanical Turk is an interesting experiment for Amazon. It is not clear how important is the project for the rest of the company and how much Jeff Bezos supports the effort after all these years. But Bezos is well-known for planning for the long term, and my (imperfect) statistics tend to confirm (tentatively) that the market is on a good path.

Let's see how things play out...