## Thursday, November 20, 2008

### The Faces of Mechanical Turk

Andy Baio is yet another enthusiast of Mechanical Turk:
When you experiment with Amazon's Mechanical Turk, it feels like magic. You toss 500 questions into the ether, and the answers instantly start rolling in from anonymous workers around the world. It was great for getting work done, but who are these people? I've seen the demographics, but that was too abstract for me.

Last week, I started a new Turk experiment to answer two questions: what do these people look like, and how much does it cost for someone to reveal their face?

So Andy paid Turkers 50 cents to upload their photo of theirs with a handwritten note saying why they Turk. So here is how Turkers look like and why they Turk!

Needless to say, this is already hanging outside of my office door. This picture is going into a slide in my Mechanical Turk talk, so that I can give a good answer to the question "Who are these people and why do they do it?"

(via Brendan O'Connor)

## Thursday, November 13, 2008

### Social Annotation of the NYT Corpus?

While I am waiting for the arrival of the New York Times Annotated Corpus, I have been thinking about the different tasks that we could use the corpus for. For some tasks, we might have to run additional extraction systems, to identify entities that are not currently marked. So, for example, we could use the OpenCalais system to extract patent issuances, company legal issues, and so on.

And then, I realized that most probably, tens of other groups will end up doing the same, over and over again. So, why not run such tasks once, and store them for others to use? In other words, we could have a "wiki-style" contribution site, where different people could submit their annotations, letting other people use them. This would save a significant amount of computational and human resources. (Freebase is a good example of such an effort.)

Extending the idea even more, we could have reputational metrics around these annotations, where other people provide feedback on the accuracy, comprehensiveness, and general quality of the submitted annotations.

Is there any practical problem with the implementation of this idea? I understand that someone needs access to the corpus to start with, but I am trying to think of more high-level obstacles (e.g., copyright, or conflict with the interests of publishers)?

## Wednesday, November 5, 2008

### Use of Excel-generated HTML Considered Harmful

This was one of the most strange bugs that I had to resolve.

While I was writing the blog post about computing electoral correlations across states using prediction markets, I wanted to include a table with some results, to illustrate how different states are correlated.

So, I prepared the table in Excel, and then copied and pasted it on Blogger.

Then a strange thing happened: My Feedburner feed stopped working. Nobody received any updates, and suddenly the number of subscribers fell to zero.

Trying to figure out that was wrong, I got a message that my feed was bigger than 512Kb. Admittedly, my table was kind of big, with more than 300 entries. So, I decided to trim it down, to 30-50 rows.

After that fix my feed started working again.

I was still puzzled though why the problem did not appear earlier, given that I have written some pretty long posts (e.g., Why People Participate on Mechanical Turk?) and I never exceeded the 512Kb limit.

Well, the problem was not over. Even though my feed was working, the post about computing electoral correlations across states using prediction markets did not appear in Google Reader, and in other readers. However, the reader on my cell phone was displaying the post. Very very strange.

I followed all the troubleshooting steps on Feedburner, nothing.

So, I decided to take a closer look at the HTML source. I was in for a surprise! The table that I copied and pasted from Excel, had a seriously fat, ugly, and problematic HTML code.

As an example, instead of having a table cell written as "<td>NTH.DAKOTA</td>" it had the following code instead:
<td class="xl63" style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color rgb(149, 179, 215); border-width: medium medium 0.5pt; background: rgb(219, 229, 241) none repeat scroll 0% 0%; font-size: 10pt; color: black; font-weight: 400; text-decoration: none; font-family: Calibri; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">NTH.DAKOTA</td>
This not only resulted in a seriously padded HTML, it was also generating validation problems, causing Google Reader to reject the post and not display it at all.

Solution? Nuking by regular expression. I replaced all the "<td [^>]+>" instances with "<td>", and I had a seriously trimmed table from 116Kb (!) to 7Kb. After that, Google picked the post within seconds....

Lesson? Never, ever use an Excel-generated table in Blogger. Or if you need to do that, make sure to remove all the fat...

## Monday, November 3, 2008

The summer of 2004, after completing my thesis, I found myself with plenty of time on my hands. So, I decided that it would be fun to research my academic genealogy. I knew the advisor of my advisor, Hector Garcia-Molina, and it was rather easy to find his advisor, Gio Widerhold. Gio had also listed John Amsden Starkweather as his own advisor.

Going beyond that was proven kind of difficult. I had to order the thesis of John Starkweather and see the dedication there: His advisor was Carl Porter Duncan. In a similar pattern, and spending considerable time at the library, I managed to dig my genealogy back to 1800's and to Hermann von Helmhotz. After that, I hit the entry at Chemical Genealogy and relied on the tree there.

Today, through chain of events, I happened to run into Neurotree.org that also contains my genealogy and goes back to 1000AD. By expanding the tree as much as possible, I managed to get a pretty impressive printouts, taking four 11x17 pages :-)

Until now, my tree was going back "only" to 1500's and to Pierre Richer de Belleval, who was teaching in Avignon, France. Now, I can proudly say that my tree goes back 1000AD, and its oldest roots are Greek Byzantines, including names such as Ioannis Mauropous, Michail Psellos, and Grigorios Palamas.

Accuracy of the information? I have no idea. But I have something to talk about when I go back to Greece for the winter break.

## Sunday, November 2, 2008

### Computing State Correlations in Elections Using (Only) Prediction Markets

(The post below is largely based on the work of Nikolay. I am responsible for all the mistakes.)

One thing that always puzzled me was how to compute correlations across electoral results in the state level. In 2006, there was some discussion about the accuracy of the prediction markets in predicting the outcome of the senate races. From the Computational Complexity blog:
For example, the markets gave a probability of winning 60% for each of Virginia and Missouri and the democrats needed both to take the senate. If these races were independent events, the probability that the democrats take both is 36% or a 64% chance of GOP senate control assuming no other surprises.
However, everyone will agree that races across states are not independent events. For example, if Obama wins Georgia (currently trading at 0.36), the probability of winning Ohio will be higher than 0.70, the current price of the Ohio.DEM contract. (As we will see later in the post, the price of Ohio, given that Democrats win Georgia, is close to 0.81.)

So, what we would like to estimate is, for two states A and B, what is the probability $Pr\{A|B\}$ that a candidate wins state $A$, given that the candidate won state $B$.

One way to model these dependencies is to run conditional prediction markets but this leads to an explosion of possible contracts. Participation and liquidity is not great even in the current markets for state-level elections, there is little hope that combinatorial markets will attract significant interest.

Another way to compute these probabilities is to use and expand the model described in my previous post about modeling volatility in prediction markets. Let's see how to do that.

Expressing Conditional Contracts using Ability Processes

Following this model, for each state we have a "ability difference processes" $S(t)$ that tracks the difference in abilities of the two candidates. If at expiration time $T$, $S(T)$ is positive, the candidate wins the state; otherwise the candidate loses. So, we can write:

$$Pr\{A|B\} = Pr\{ S_A(T) \geq 0 | S_B(T) \geq 0 F(t) \}$$

where $F(t)$ is the information available at time $t$. Using Bayes rule:

$Pr\{A|B\} = \frac{Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}}{Pr\{ S_B(T)\geq 0 | F(t) \}}$

In the equation above, $Pr\{ S_B(T)\geq 0 | F(t) \}$ is simply the price of the contract for state $B$ at time $t$, i.e., $\pi_B(t)$.

Pricing Joint Contracts using Ito Diffusions

The challenging term is the price of the joint contract $Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}$.

To price this contract, we generalize the Brownian motion model, and we assume that the joint movement of $S_A(t)$ and $S_B(t)$ is a 2d Brownian motion. Of course, we do not want the 2d motion to be independent! Since the $S_A(t)$ and $S_B(t)$ represent the abilities, they are correlated! So, we assume that the two Brownian motions have some (unknown) correlation $\rho$. Intuitively, if they are perfectly correlated, when $S_A(t)$ goes up, then $S_B(t)$ goes up by the same amount. If they are not correlated, the movement of $S_A$ does not give any information about the movement of $S_B$, and in this case $Pr\{A|B\} = Pr\{A\}$

Without going into much details, in this model the price of the joint contract is:

$\pi_{AB}(t) = Pr\{ S_A(T)>0, S_B(T)>0 | F(t) \} = N_\rho ( N^{-1}(\pi_A(t)), N^{-1}(\pi_B(t)) )$

where $N_\rho$ is the CDF of the standard bivariate normal distribution with correlation $\rho$ and $N^{-1}$ is the inverse CDF of the standard normal. Intuitively, this is nothing more than the generalization of the result that we presented in the previous post.

However, the big question remains: How do we compute the value $\rho$?

Now the neat part: We can infer $\rho$ by observing the past time series of the two state-level contracts. Why is that?

First of all, we know that the price changes of the contracts are given by:

$d\pi(t) = V(\pi(t), t) dW$, which gives $dW = \frac{d\pi(t)}{ V( \pi(t), t)}$,

We can observe $d\pi(t)$ over time. We also know that $V( \pi(t), t) = \frac{1}{\sqrt{T-t}} \cdot \varphi( N^{-1}( \pi(t) ) )$ is the instantaneous volatility of the a contract trading at price $\pi(t)$ at time $t$.

So essentially we take the price differences over time and we normalize them by the expected volatility. This process generates the "normalized" changes in abilities, over time and across states.

Therefore, we can now use standard correlation measures of time series to infer the hidden correlation of the ability processes. (And then compute the conditional probability.) If the two ability processes were powered by independent Brownian motions $W_A$ and $W_B$, then $dW_A$ and $dW_B$ would not exhibit any correlation. If the two processes are correlated, then we can measure their cross-correlation by observing their past behavior.

Now, by definition of cross-correlation we get:

$\rho \approx \Sigma_{i=o}^t \frac{ (\pi_A(i+1) - \pi_A(i)) \cdot (\pi_B(i+1) - \pi_B(i)) }{ V(\pi_A(i), i) \cdot V(\pi_B(i), i) }$

OK, if you stayed with me so long, here are some of the strong correlations as observed and computed based on the InTrade data. How to read the table? If Democrats win state B, what is the probability Pr(A|B) that they will also win state A? To make comparisons easy, we also list the current price of the contracts A and B. The "lift" shows how much the conditional probability increases compared to the base probability. I skipped the cases when a state has very high probability, i.e., above 0.9 (as they are either uninformative) or very low probability, i.e., less than 0.2 (as they are highly unlikely to happen). I also list only state pairs with lift larger than 1.10. You can also get the list as an Excel spreadsheet.

Enjoy!

## Friday, October 31, 2008

### The New York Times Annotated Corpus

Last week, I was invited to give a talk at a conference at the New York Public Library, about the preservation of news. I talked about our research in the Economining project, where we are trying to find the "economic value" of textual content on the Internet.

As part of the presentation, I discussed some problems that I had in the past with obtaining well-organized news corpora that are both comprehensive and also easily accessible using standard tools. Factiva has an excellent database of articles, exported in a richly annotated XML format but unfortunately Factiva prohibits data mining of the content of its archives.

The librarians in the conference were very helpful in offerring suggestions and acknowledging that providing content for data mining purposes should be one of the goals of any preservation effort.

So, yesterday I received an email from Dorothy Carner informing me about the availability of The New York Times Corpus, a corpus of 1.8 million articles from The New York Times, dating from 1987 until 2007. The details are available from http://corpus.nytimes.com but let me repeat some of the interesting facts here (the emphasis below is mine):

The New York Times Annotated Corpus is a collection of over 1.8 million articles annotated with rich metadata published by The New York Times between January 1, 1987 and July 19, 2007.

With over 650,000 individually written summaries and 1.5 million manually tagged articles, The New York Times Annotated Corpus has the potential to be a valuable resource for a number of natural language processing research areas, including document summarization, document categorization and automatic content extraction.

The corpus is provided as a collection of XML documents in the News Industry Text Format (NITF). Developed by a consortium of the world’s major news agencies, NITF is an internationally recognized standard for representing the content and structure of news documents. To learn more about NITF please visit the NITF website.

Highlights of The New York Times Annotated Corpus include:

• Over 1.8 million articles written and published between January 1, 1987 and June 19, 2007.
• Over 650,000 article summaries written by the staff of The New York Times Index Department.
• Over 1.5 million articles manually tagged by The New York Times Index Department with a normalized indexing vocabulary of people, organizations, locations and topic descriptors.
• Over 275,000 algorithmically-tagged articles that have been hand verified by the online production staff at NYTimes.com.
• Java tools for parsing corpus documents from xml into a memory resident object.

Yes, 1.8 million articles, in richly annotated XML, with summaries, with hierarchically categorized articles, and with verified annotations of people, locations, and organizations! Expect the corpus to be a de facto standard for many text-centric research efforts! Hopefully more organizations are going to follow the example of New York Times and we are going to see such publicly available corpora from other high-quality sources. (I know that Associated Press has an archive of almost 1Tb of text, in computerized form, and hopefully we will see something similar from them as well.)

How can you get the corpus? It is available from LDC, for 300 USD for non-members; members should get this for free.

I am looking forward to receiving the corpus and start playing!

## Monday, October 20, 2008

### Modeling Volatility in Prediction Markets, Part II

In the previous post, I described how we can estimate the volatility of prediction markets using additional prediction market contracts, aka options on prediction markets. I finished indicating that techniques that can be used to price options for stocks, are not directly applicable in the prediction market context.

Now, I will review a different modeling approach that builds on the spirit of Black-Scholes but is properly adapted for the prediction market context. This model has been developed by Nikolay, and is described in the paper "Modeling Volatility in Prediction Markets".

Modeling Prediction Markets as Competitions

Let's consider the simple case of a contract with a binary outcome. For example, who will win the presidential election? McCain or Obama?

The basic modeling idea is to assume that each competing party has an ability $S_i(t)$ that evolves over time , moving as a Brownian motion. (A simplified example of such ability would be the number of voters for a party, the number of points in a sports game, and so on.) At the expiration of the contract at time $T$ , the party $i$ with the higher ability $S_i(T)$ wins.

Actually, to have a more general case, we can use a generalized form of the Brownian motion, an Ito diffusion, that allows for the abilities to have a drift $\mu_i$ over time (i.e., the average rate of growth), and different volatilities $\sigma_i$ . The quantity that we need to monitor is the difference of the two ability processes $S(t)=S_1(t)-S_2(t)$ . If at the expiration of the contract at time $T$ we have $S(T)>0$ , then party 1 wins. If $S(T)$ is less than 0, then party 2 wins. Interestingly, the difference $S(t)$ is also an Ito diffusion, with $\mu=\mu_1-\mu_2$ , $\sigma=\sqrt{\sigma_1^2+\sigma_2^2-2\rho \sigma_1 \sigma_2}$ , where $\rho$ is the correlation of the two ability processes. Under this scenario, the price of the contract $\pi(t)$ at time $t$ is:

$\pi(t) = Pr\{ S(T)>0 | S(t) \}$

which can be written as:

$\pi(t) = N\Big(\frac{S(t) + \mu \cdot (T-t)}{\sigma \cdot \sqrt{T-t} } \Big)$

where $N(x) =\frac{1}{2} \Big[ 1 + erf\Big( \frac{x}{\sqrt{2}} \Big) \Big]$ is the CDF of the normal distribution with mean 0, and standard deviation 1 and $erf(x)$ is the error function. Notice that as time $t$ gets closer to the expiration, the denominator gets close to 0, which makes the ratio closer to $\infty$ or $-\infty$, and price $\pi(t)$ gets close to 0 or 1. However, if $S(t)$ is close to 0 (i.e., the two parties are almost equivalent), then we observe increasingly higher instability as we get close to expiration, as small changes in the difference $S(t)$ can have a significant effect in the outcome.

For example, consider two parties: party 1 with an ability that has positive drift $\mu_1=0.2$ and volatility $\sigma_1=0.3$, and party 2 with negative drift $\mu_2=-0.2$ and higher volatility $\sigma_2=0.6$. In this case, assuming no correlation, the difference is a diffusion with drift $\mu=0.4$ and volatility $\sigma=0.67$. Here is one scenario of the evolution, and below you can see the price of the contract, as time evolves.

As you may observe from the example, the red line (party 1) is for the most time above the blue line (party 2), which causes the green line (the difference) to be above 0. As the contract gets close to expiration, the contract gets closer and closer to 1 (i.e., party 1 will win). Close to the end, the blue line catches up, which causes the prediction market contract to have a big swing from almost 1 to 0.5, but then swings back up as party 1 finally finishes at the expiration above party 2.

So far, we generated a nice simulation but our results depend on knowing the parameters of the underlying "ability processes". Since we never get to observe these values, what is the use of all this exercise?
Well, the interesting thing is that by using the price function, we can now proceed to derive its volatility. Without going into the details, we can prove that the volatility of the prediction market contract is:

$V(t) = \rac{1}{\sqrt{T-t}} \cdot \varphi( N^{-1}( \pi(t) ) )$

where $N^{-1}(x)$ is the inverse CDF of the standard normal distribution and $\varphi(x)=\frac{exp( (-x^2)/2)}{\sqrt{2\pi}}$ is the density of the standard normal distribution.

In other words, volatility depends only on the current price of the contract and time to expiration! Anything else is irrelevant! Drifts do not matter: they are priced already in the current price of the contract, since we know where the drift will lead at expiration. The magnitude of the volatilities are also priced into the current contract price: higher volatilities cause the contract price to get closer to 0.5, as it is easier for $S(t)$ to move above and below 0 when it has high volatility. Furthermore, the direction of the volatilities of the underlying abilities is indifferent as they can move the difference into either direction with equal probability. (The only assumption is that the volatilities of the underlying abilities processes do not change over time.)

Volatility Surface

So, what this model implies for the volatility of the prediction markets? First of all, the model says that volatility increases as we move closer to the expiration, as long as the price of the contract is not 0 or 1. For example, assuming that now we have $t=0$ and expiration is at $T=1$, the volatility is expected to increase as follows:

So, how volatility changes with different contract prices? As you can see, volatility is highest when the contract trades at around 0.5, and gets close to 0 when price is 0 or 1.

And just to combine the two plots and present a nice 3d plot, with the present being at $t=0$ and expiration at $T=1$:

The experimental section in the paper "Modeling Volatility in Prediction Markets" (shorter conference paper presented at ACM EC'09), indicates that the actual volatility observed in the InTrade prediction markets fits well the current model.

Now, given this model, we can judge what is a "noise movement" and what is actually a "significant move" in prediction markets. Furthermore, we can provide an "error margin" for each day, indicating the confidence bounds for the market price.

I will post more applications of this model in the next few days. We will see how to price the X contracts on InTrade, and a way to compute correlations of the outcomes of state elections, given simply the past movements of their corresponding prediction markets.

### Modeling Volatility in Prediction Markets, Part I

A few weeks back, I was thinking about the concept of uncertainty in prediction markets. The price of a contract in a prediction market today gives us the probability that an event will happen. For example, the contract 2008.PRES.OBAMA is trading at 84.0, indicating that there is an 84% chance that Obama will win the presidential election.

Unfortunately, we have no idea about the stability and robustness of this estimate. How likely it is that the contract will fall tomorrow to 80%? How likely it is to jump to 90%? By treating the contract price as a "deterministic" number, we do not capture such information. We need to treat the price as a random variable with its own probability distribution, out of which we observe just the mean by looking at the prediction market.

However, to fully understand the stability of the price we need further information, beyond just the mean of the probability, revealed by the current contract price.

A first step is to look at the volatility of the price. One approach is to look at the past trading behavior, but this analysis will give us the past volatility, not the expected future volatility of the contract.

Predicting Future Volatility using Options

So, how can we estimate the future volatility of a prediction market contract?

There is a market approach to solve this problem. Namely, we can run prediction markets on the results of the prediction markets!

Recently, Intrade has introduced such contracts, the so-called X contracts (listed under "Politics->Options: US Election" from the sidebar). For example, the contract "X.22OCT.OBAMA.>80.0" pays 1 USD if the contract "2008.PRES.OBAMA" will be higher than 80.0 on Wed 22 Oct 2008. Traditionally, the threshold defined in the options contract is called strike price (e.g., the strike price for X.22OCT.OBAMA.>80.0 is 80.0).

A set of such contracts can reveal the distribution of the probability of the event for the underlying contract 2008.PRES.OBAMA. In other words, we can see not only what is the mean probability that Obama will be elected president but we can also see the expected downside risk or upside potential of the 2008.PRES.OBAMA contract. For example, the X.22OCT.OBAMA.>80.0 has a price of 90.0, indicating a 90% chance that the 2008.PRES.OBAMA contract will be above 80.0 on Oct 22nd.

Now, given enough contracts, with strike prices at various levels, we can estimate the probability distribution for the likely prices of the contract. For example, we can have contracts with strike price 10, 20, ..., 90 that will give us the probability that the contract will trade above 10, 20, ... and 90 points at some specific point in time, which corresponds to the expiration date of the options contract. So for each date, we need 9 contracts, if we need to have a 10 column histogram that describes the distribution.

Note that if we want to estimate the probability distribution dynamics we will need to setup 9 contracts for each date that we want to measure. Of course, this implies that we have plenty of liquidity in the markets if we want to rely purely on the market for such estimates.

Pricing Options and the Black-Scholes Formula

A natural question is: Can we price such "options on options" contracts?

This will at least give us some guidance on the likely prices of such contracts, if not for anything else, but to just start the market at the appropriate level. (For example, if we have a market scoring mechanism.)

There is significant research in Finance on pricing options for stocks. The Black-Scholes formula is one of the most well-known examples for deriving prices for options on stocks. The basic idea behind Black-Scholes is that the underlying stock price follows a Brownian motion, moving randomly up and down. Then by extracting the probability that this random stock move will reach various levels, it is possible to derive the option prices. (Terrence Tao has a very easy to read 3-page note explaining the Black-Scholes formula and a longer blog posting.)

Why not applying directly this model to price options on prediction markets? There are a few fundamental problems but the most important one is the bounded price of the underlying prediction market contract. The price of a prediction market contract cannot go below 0 or above 1, so the Brownian motion assumption is invalid. In fact, if we try to apply the Black-Scholes model on a prediction market, we get absurd results.

In the next post, I will review an adaptation of the Black-Scholes model that works well for prediction markets, and leads to some very interesting results!

## Saturday, October 4, 2008

### Reviewing the Reviewers

I received today the latest issue of TOIS, and the title of the editorial by Gary Marchionini caught my eye: "Reviewer Merits and Review Control, in an Age of Electronic Manuscript Management Systems". The article makes the case for using the electronic management systems to allow for grading of the reviewer efforts and allow for memory of the reviewing process, including both the reviews and the reviewer ratings.

In principle, I agree with the idea. Having the complete reviewing history for each reviewer, and for each journal and conference, can bring several improvements in the process:

1. Estimating and Fixing Biases

One way to see the publication process is as noisy labeling of an example, where the true labels are "accept" or "reject". The reviewers can be modeled as noisy processes, each with its own sensitivity and specificity. The perfect reviewer has sensitivity=1, i.e., marks as "accept" all the "true accepts", and has specificity=1, i.e., marks as "reject" all the "true rejects".

Given enough noisy ratings, it is possible to use statistical techniques to infer what is the "true label" for each paper, and infer at the same time the sensitivity and specificity of each reviewer. Bob Carpenter has presented a hierarchical Bayesian model that can be used for this purpose, but simpler maximum likelihood models, like the one of Dawid and Skene, also work very well. In my own (synthetic) experiments the MLE method worked almost perfectly for recovering the quality characteristics of the reviewers and to recover the true labels of the papers (of course, without the uncertainty estimates that the Bayesian methods provide.)

One issue with such a model? The assumption that we have an underlying "true" label. For people with different backgrounds and research interests, what is a "true accept" and what a "true reject" is not easy to define even with perfect reviewing.

2. Reviewer Ratings

Reviewer reviewing by the editors

The statistical approaches described above reduce the quality of a reviewer into two metrics. However, these ratings only show agreement of the recommendations with the "true" value (publish or not). They say nothing about other aspects of the review: comprehensiveness, depth, timeliness, helpfulness, are all important aspects that need to be captured using different methods.

Marchionini mentions that current manuscript management systems allow the editors to rate reviewers in terms of timeliness and in terms of quality. By following the references, I ran into the article Reviewer Merits, published in Information Processing and Management, where the Editors-in-Chief of many IR journals stated:
Electronic manuscript systems easily provide time data for reviewers and some offer rating scales and note fields for editors to evaluate review quality. Many of us (editors) are beginning to use these capabilities and, over time, we will be able to have systematic and persistent reviewer quality data. Graduate students, faculty, chairs, and deans should be aware that these data are held.
Now, while I agree with reviewer accountability, I think that this statement is not worded properly. I find the use of the phrase "should be aware" as semi-threatening. ("We, the editors, are rating you... remember that!")

If reviewer quality history is being kept, then the reviewers should be aware and have access to it. Being reminded that "your history is out there somewhere" is not the way to go. If reviewer quality is going to be a credible evaluation metric, the reviewers need to know how well they did. (Especially junior reviewers, and especially when the review does not meet the quality standards.)

Furthermore, if the editors are the ones rating the reviewers, then who controls the quality of these ratings? How do we know that the evaluation is fair and accurate? Notice that if we have a single editorial quality rating per review, then the statistical approaches described above do not work.

Reviewer reviewing by the authors

In the past, I have argued that authors should rate reviewers. My main point in that post was to propose a system that will encourage reviewers to participate by rewarding the highly performing reviewers. (There is a similar letter to Science, named "Rewarding Reviewers.") Since authors will have to provide multiple feedback points, it is much easier to correct the biases in the reviewer ratings of the authors.

3. Reviewer History and Motivation

If we have a history of reviewers, we should not forget potential side-effects. One clear issue that I see, is motivation. If "reviews of reviewers" become a public record, then it is not clear how easy it will be to recruit reviewers.

Right now, many accept invitations to review, knowing that they will be able to do a decent job. If the expectations increase, it will be natural for people to reject invitations, focusing only on a few reviews for which they can do a great job. Arguably, reviewer record is never going to be as important for evaluation as other metrics, as research productivity or teaching, so it is unlikely to get more time devoted to it.

So, there will always be the tradeoff: more reviews or better reviews?

One solution that I have proposed in the past: Impose a budget! Any researcher should remove from the reviewing system the workload it generates. Five papers submitted (not accepted) within a year? The researcher needs to review 3x5 = 15 papers to remove the workload that these five papers generated. (See also the article "In Search of Peer Reviewers" that has the same ideas.)

4. Training Reviewers

So, suppose that we have the system in place to keep reviewer history, we have solved the issue of motivation, and now one facet of researcher reputation is the reviewer quality score. How do we learn how to review properly? A system that generates a sensitivity and specificity of a reviewer can provide some information on how strict or lenient a reviewer is, compared to others.

However, we need something more than that. What makes a review constructive? What makes a review fair? In principle, we could rely on academic advising to pass such qualities to newer generations of researchers. In practice, when someone starts reviewing a significant volume of papers, there is no advisor or mentor to oversee the process.

Therefore, we need some guidelines. An excellent set of guidelines is given in the article "A Peer Review How-To". Let me highlight some nuggets:

Reviewers make two common mistakes. The first mistake is to reflexively demand that more be done. Do not require experiments beyond the scope of the paper, unless the scope is too narrow.
[...]
Do not reject a manuscript simply because its ideas are not original, if it offers the first strong evidence for an old but important idea.

Do not reject a paper with a brilliant new idea simply because the evidence was not as comprehensive as could be imagined.

Do not reject a paper simply because it is not of the highest significance, if it is beautifully executed and offers fresh ideas with strong evidence.

Seek a balance among criteria in making a recommendation.

Finally, step back from your own scientific prejudices

And now excuse me, because I have to review a couple of papers...

## Thursday, October 2, 2008

### VP Debate and Prediction Market Volatility

I was watching the VP debate on CNN, and CNN was reporting the reactions of "undecided Ohio voters" to what the VP candidates were saying. Although interesting, it was not satisfying. I wanted a better way to see the real time reactions. Blogs were relatively slow to post, and mainstream media were simply describing the minutia of the debate. What is the solution? Easy. Prediction markets!

I remembered that Intrade has a contract VP.DEBATE.OBAMA, "Barack Obama's Intrade value will increase more than John McCain's following the VP debate"

So, during the debate, I was following the fluctuations of the contract's price to measure the reactions. Here is how the contract moved from 8.30pm EST since 10.30pm EST. (The debate started at 9pm EST, and lasted until 10.30pm EST.)

At the beginning, the contract was below 50.0%, reflecting probably that the fact that Palin was giving reasonable and coherent responses, disappointing perhaps those that were expecting material for a Saturday Night Live performance.

However, at the second 45 minutes of the debate, as the discussion moved into foreign policy issues, the contract started moving up, as Biden started giving more immediate answers, and Palin started avoiding questions and replied using stereotypical, canned answers.

What I found interesting was the significant increase in variance as the debate came close to the end. Prices fluctuated widely during the closing statements of the two VP candidates.

This increased volatility as the contract comes to a close, is actually a fact that we observed consistently in many contracts over time: when the contract is not close to 0.0 or 1.0, the price fluctuates widely as we get close to expiration. While I could explain this intuitively, I did not have a solid theoretical understanding of why.

So, what to do in this case? You simply ask a PhD student to explain it to you! I asked Nikolay Archak, and within a few weeks, Nikolay had the answer.

The basic result:
• Volatility increases as contract price gets closer to 0.5,
• Volatility decreases as contract price gets closer to 0.0 or to 1.0,
• Volatility increases as we get close to the expiration, and approaches infinity if price is not 0.0 or 1.0.

## Tuesday, September 30, 2008

### Sarah Palin and Markov Models

How good are n-gram Markov models for language modeling?

Apparently pretty good for modeling the responses of Sarah Palin during her last couple of interviews! Check them out:

http://interviewpalin.com/

http://palinspeak.com/

## Friday, September 12, 2008

### How Much Turking Pays?

After reporting the results about "why Turkers Turk," I received a set of questions about further things that people would like to know about the Turkers. One of the most common questions was about the compensation of Turkers: "How much do they make by Turking?"

Well, there is no question about Mechanical Turk, that Mechanical Turk cannot answer, so here we go. I posted the very same question on MTurk, asking people about their average compensation per week. Without further ado, here are the results:

## Tuesday, April 22, 2008

### How Much a Paper Submission Costs?

I have been reading the post by Lance Fortnow about the cost of a class, and what is the amount that students pay collectively for an hour of teaching. This made me think of a similar calculation for the cost of submitting a paper to a conference. We are accustomed to submit papers and then asking for high-level reviews, often disregarding the associated costs. "What cost?", you will ask, given that everything in academic reviewing is done in a gratis, voluntarily basis. Fundamentally our peer reviewing system is based on an implicit tit-for-tat agreement: "I will contribute a number of reviews as a reviewer, so that others can then review my own papers".

In most cases, though, some employer is paying the reviewer (a university, a research lab...) and reviewing consumes some productive time. A typical computer scientist with PhD will have a salary above $100K per year, which roughly corresponds to a$50/hr-$100/hr salary. A typical review (at least for me) takes at least 3 hours to complete, in the best case, corresponding to a cost of$150 to $300 per review. Additionally, every paper submission gets 3-4 reviewers, which results in a cost per submission of$500 to $1000 per paper. Therefore, a conference like SIGMOD, WWW, KDD, and so on, with 500-1000 submissions per year, consumes from$250,000 to $1,000,000 in resources, just for conducting the reviewing. I simply find that amount impressive. This leads to the next question: Have you ever thought about your balance? How many papers do you review and how many papers do you submit per year? If someone had to pay$1000, I doubt that we would see many half-baked submissions. Or, if credit was given for each conducted review, then we would have more reviewing resources available. I do not advocate a system based on monetary awards, but before complaining about the quality of the reviews that you get, think: What is your balance?