Wednesday, July 30, 2008

Mechanical Turk Allows Bulk Submissions Using Templates

Mechanical Turk is a very useful service that I have used repeatedly in the past for a variety of tasks. While it was a great tool, one of the shortcomings was the lack of a web-based interface support to create batches of tasks. Using the web it was only possible to create a single task at a time, potentially asking many Turkers to complete it.

If someone wanted to generate a large number of similar tasks (e.g., submit 1000 query-document pairs for relevance judgments as opposed to 1 query-document pair) then the only option was to use some command-line tools, or use the MTurk API. Admittedly, the tools were easy to use, but still the need to resort to "programming" was a problem for many people with no programming background.

So, today Amazon released the ability to generate "templates" that will automate the generation of such tasks. The basic idea: You generate a template, and specify in the template a variable placeholder (e.g., #document). Then you upload a separate file with the data that will be used to populate the placeholder, and you are done. A set of HITs are then generated automatically.

Amazon has already created some basic HIT templates that illustrate the process, showing example tasks such as "Get Product Name from Image," "Are these products the same," "Evaluate the Effectiveness of Search Results" and so on. The Getting Started Guide illustrates how easy is to generate the batches of HITs.

Nice! I may finally generate some batch HITs myself, instead of asking my students to do all the work for me :-)

Monday, July 28, 2008

JDMR: Officially live

It seems that the website of the Journal of Database Management Research is now live.

The current title of the website is "The Proceedings of the VLDB Endowment (PVLDB)". Since the first issue of this journal will contain the papers that have been accepted for publication at
VLDB 2008, it seems that simply renaming the proceedings of VLDB was not deemed acceptable. Therefore, according to the transition plans:
All regular (research) papers selected for presentation at the VLDB Conference, beginning 2008, will be offered publication in a new Journal, called "Proceedings
of the VLDB Endowment" (PVLDB or "Proc. VLDB" for short). Volume 1 of PVLDB will appear in Aug 2008. There will be no separate VLDB Conference Proceedings.

Papers accepted for presentation to the VLDB Conference were reviewed once more, by the same set of reviewers, before being accepted for publication in PVLDB. This was a "light" review round, in this first transition year, to make sure that reviewer suggestions had been adopted in so far as practically possible.
The new hybrid journal will start accepting submissions on August 1st, 2008.

Sunday, July 27, 2008

Using The New York Times Reader

A few weeks back, I installed on my computer the "New York Times Reader". It is an application from The New York Times that runs quietly in the background, downloading locally all the articles of NY Times published over the last week. It also provides its own non-browser interface for browsing through the articles. The layout emulates more the layout of a paper newspaper than the layout of a web edition. I have used the reader a little bit when I downloaded it, but then I forgot about it and kept reading the news over the web.

Today, though, I found myself stuck on a 10-hr flight to Greece, with no Internet connectivity. Well, no problem. Actually I enjoy such long flights *because* there is no Internet connectivity and I can really focus on whatever I am doing, without (voluntarily or not) interruptions.

After going through all my email, I answered all the emails that were staying in my inbox for a while, and then I started reading blogs using the offline option of Google Reader. Unfortunately, reading blogs offline is not a very enjoyable experience. Some blogs are simply pointing to external articles, some others have only partial feeds, and some others are not meaningful to read without going over the comments and the discussion. So, quickly, I ran out of stuff to do.

Then, I noticed that I had the paper version of New York Times in front of me. I tried to read a little bit, just to realize that it is a royal pain to read a newspaper with the layout of New York Times on a plane. New York Times deserves and needs a coffee table, not a tray that can barely fit the laptop.

At that time, I realized that I had the Reader available on my laptop. Not sure if it syncs, I opened it. Fortunately, it has been quietly syncing all the material, and now I had one week of New York Times articles at my disposal. They layout was nice, the font type excellent, and the interface very intuitive. Plus the ability to go through all the sections (some of them published only once a week) is a big advantage. Therefore, I happily read one week worth of NY Times (ok, impossible, but it felt like that) on my laptop, ignoring completely the paper version sitting next to me.

Then, I noticed the "search" link. I went to the search screen and I started typing various queries. Well, was I surprised! Search was immediate, "results-as-you-type"-style. Plus the results were nicely organized as a newspaper, and ordered by relevance going from left to right. Here is a screenshot of the results for the query "economy":


Next step: See what is this "Topic Explorer". This generates a result screen like this:
Not very impressive initially, but the more interesting thing happens when you click an article, and you see a list of all the associated topics:
Very easy to go through related articles, easy to see the level of interest for each topic, and so on. I guess a little bit further visualization could also help. Also, some extra work to allow for faceted navigation would make the interface even more interesting. But it is definitely an enjoyable experience as-is, demonstrating the power of truly online interfaces over interfaces that simply try to emulate paper.

Sunday, July 13, 2008

Options Markets for Prediction Markets?

Over the last few months, together with George Tziralis, we have been doing research on the efficiency of the prediction markets.

At the very basic level, we want to examine how fast the market can incorporate into the price of a contract all the available information. Our experiments, so far, on InTrade show that the price of a contact tends to be "unpredictable" when using purely historic price data. In other words, the markets on InTrade tend to be "weakly efficient".

In fact, some pretty extensive time-series tests showed that the price movements are almost a random walk. Furthermore the spectrum of the random walk movements (after doing a Fourier transform) tends to be something close to 1/f noise (or pink noise). Assuming that each price change indeed captures the effect of a real-time event (this is a big assumption), then we can conclude that the importance of the events that happen tends to follow a "power-law": there is the "big" events that move the market significantly, but there is also a large number of minor events that cummulatively can move the market significantly, even though none of them is of any particular importance.

The next question that we wanted to examine, was whether the price changes have the same importance in different times. By analyzing the markets we observed that the prediction markets, just like the financial markets, exhibit the phenomenon of volatility clustering. In other words, there are periods in which the market tends to move up and down a lot, and periods in which the prices are relatively stable.

What are the implications of this findings? If we want to assign an importance value in a price change, we have to take into consideration the (past and future) volatility of the prices. Going from 60 to 80 in a period of low volatility signals a much more important event compared to the case of going from 60 to 80 in a period of high volatility.

With George, using the general family of the ARCH models we showed that we can model and predict nicely the volatility of the markets, and we can then estimate properly the importance of the events that correspond to these price changes.

Even though such models are good, there are limits on their predictive power.

A much better approach for estimating the future volatility of a market is to allow people to directly trade volatility. In financial markets, this happens by allowing people to buy options for a given stock. The values of the options give a good idea on what is the expected, future volatility of a stock (and its expected upward or downward movement). Therefore, having options allow us to estimate how robust and stable a particulat contract will be in the future.

In principle there is nothing that prevents us from having such derivative markets on top of the existing prediction markets. For example, we could have contacts such as "Obama@0.80/01Sep2008", which would allow people to buy for 0.80 cents, on September 1st 2008, a contract for Obama winning the presidency. If the actual contract is trading above 80%, the contract will turn a profit. (This is exactly equivalent to the existing options for stocks.)

The prices of such options give a good estimation of how volatile the contract is going to be in the future. For example, if Obama is trading today at 0.65 and noone is willing to buy the 0.80 contract for September, then traders do not believe that Obama will reach that level by September. On the other hand, if the price of the "Obama@0.80/01Sep2008" call option trades at 0.05, then people believe that the contract has good chances of being above 0.05 by September.

Using such values we can estimate the "upside volatility" of the contract. (The corresponding "put" contracts will show what is the estimated volatility on the downside.)

Of course, while such ideas are nice, we should not forget that markets work only when there is liquidity. And given the relatively low liquidity for the existing, primary prediction markets, there is little hope that such derivative markets for "options on prediction markets" will have even close to the necessary liquidity.