A Computer Scientist in a Business School

Showing posts with label presidential elections 2008. Show all posts

Sunday, November 2, 2008

Computing State Correlations in Elections Using (Only) Prediction Markets

(The post below is largely based on the work of Nikolay. I am responsible for all the mistakes.)

One thing that always puzzled me was how to compute correlations across electoral results in the state level. In 2006, there was some discussion about the accuracy of the prediction markets in predicting the outcome of the senate races. From the Computational Complexity blog:

For example, the markets gave a probability of winning 60% for each of Virginia and Missouri and the democrats needed both to take the senate. If these races were independent events, the probability that the democrats take both is 36% or a 64% chance of GOP senate control assuming no other surprises.

However, everyone will agree that races across states are not independent events. For example, if Obama wins Georgia (currently trading at 0.36), the probability of winning Ohio will be higher than 0.70, the current price of the Ohio.DEM contract. (As we will see later in the post, the price of Ohio, given that Democrats win Georgia, is close to 0.81.)

So, what we would like to estimate is, for two states A and B, what is the probability $Pr\{A|B\}$ that a candidate wins state $A$, given that the candidate won state $B$.

One way to model these dependencies is to run conditional prediction markets but this leads to an explosion of possible contracts. Participation and liquidity is not great even in the current markets for state-level elections, there is little hope that combinatorial markets will attract significant interest.

Another way to compute these probabilities is to use and expand the model described in my previous post about modeling volatility in prediction markets. Let's see how to do that.

Expressing Conditional Contracts using Ability Processes

Following this model, for each state we have a "ability difference processes" $S(t)$ that tracks the difference in abilities of the two candidates. If at expiration time $T$, $S(T)$ is positive, the candidate wins the state; otherwise the candidate loses. So, we can write:

$$Pr\{A|B\} = Pr\{ S_A(T) \geq 0 | S_B(T) \geq 0 F(t) \}$$

where $F(t)$ is the information available at time $t$. Using Bayes rule:

$Pr\{A|B\} = \frac{Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}}{Pr\{ S_B(T)\geq 0 | F(t) \}}$

In the equation above, $Pr\{ S_B(T)\geq 0 | F(t) \}$ is simply the price of the contract for state $B$ at time $t$, i.e., $\pi_B(t)$.

Pricing Joint Contracts using Ito Diffusions

The challenging term is the price of the joint contract $Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}$.

To price this contract, we generalize the Brownian motion model, and we assume that the joint movement of $S_A(t)$ and $S_B(t)$ is a 2d Brownian motion. Of course, we do not want the 2d motion to be independent! Since the $S_A(t)$ and $S_B(t)$ represent the abilities, they are correlated! So, we assume that the two Brownian motions have some (unknown) correlation $\rho$. Intuitively, if they are perfectly correlated, when $S_A(t)$ goes up, then $S_B(t)$ goes up by the same amount. If they are not correlated, the movement of $S_A$ does not give any information about the movement of $S_B$, and in this case $Pr\{A|B\} = Pr\{A\}$

Without going into much details, in this model the price of the joint contract is:

$\pi_{AB}(t) = Pr\{ S_A(T)>0, S_B(T)>0 | F(t) \} = N_\rho ( N^{-1}(\pi_A(t)), N^{-1}(\pi_B(t)) )$

where $N_\rho$ is the CDF of the standard bivariate normal distribution with correlation $\rho$ and $N^{-1}$ is the inverse CDF of the standard normal. Intuitively, this is nothing more than the generalization of the result that we presented in the previous post.

However, the big question remains: How do we compute the value $\rho$?

Now the neat part: We can infer $\rho$ by observing the past time series of the two state-level contracts. Why is that?

First of all, we know that the price changes of the contracts are given by:

$d\pi(t) = V(\pi(t), t) dW$, which gives $dW = \frac{d\pi(t)}{ V( \pi(t), t)} $,

We can observe $d\pi(t)$ over time. We also know that $V( \pi(t), t) = \frac{1}{\sqrt{T-t}} \cdot \varphi( N^{-1}( \pi(t) ) )$ is the instantaneous volatility of the a contract trading at price $\pi(t)$ at time $t$.

So essentially we take the price differences over time and we normalize them by the expected volatility. This process generates the "normalized" changes in abilities, over time and across states.

Therefore, we can now use standard correlation measures of time series to infer the hidden correlation of the ability processes. (And then compute the conditional probability.) If the two ability processes were powered by independent Brownian motions $W_A$ and $W_B$, then $dW_A$ and $dW_B$ would not exhibit any correlation. If the two processes are correlated, then we can measure their cross-correlation by observing their past behavior.

Now, by definition of cross-correlation we get:

$\rho \approx \Sigma_{i=o}^t \frac{ (\pi_A(i+1) - \pi_A(i)) \cdot (\pi_B(i+1) - \pi_B(i)) }{ V(\pi_A(i), i) \cdot V(\pi_B(i), i) }$

Conditional Probabilities using InTrade

OK, if you stayed with me so long, here are some of the strong correlations as observed and computed based on the InTrade data. How to read the table? If Democrats win state B, what is the probability Pr(A|B) that they will also win state A? To make comparisons easy, we also list the current price of the contracts A and B. The "lift" shows how much the conditional probability increases compared to the base probability. I skipped the cases when a state has very high probability, i.e., above 0.9 (as they are either uninformative) or very low probability, i.e., less than 0.2 (as they are highly unlikely to happen). I also list only state pairs with lift larger than 1.10. You can also get the list as an Excel spreadsheet.

Enjoy!

Thursday, October 2, 2008

VP Debate and Prediction Market Volatility

I was watching the VP debate on CNN, and CNN was reporting the reactions of "undecided Ohio voters" to what the VP candidates were saying. Although interesting, it was not satisfying. I wanted a better way to see the real time reactions. Blogs were relatively slow to post, and mainstream media were simply describing the minutia of the debate. What is the solution? Easy. Prediction markets!

I remembered that Intrade has a contract VP.DEBATE.OBAMA, "Barack Obama's Intrade value will increase more than John McCain's following the VP debate"

So, during the debate, I was following the fluctuations of the contract's price to measure the reactions. Here is how the contract moved from 8.30pm EST since 10.30pm EST. (The debate started at 9pm EST, and lasted until 10.30pm EST.)

At the beginning, the contract was below 50.0%, reflecting probably that the fact that Palin was giving reasonable and coherent responses, disappointing perhaps those that were expecting material for a Saturday Night Live performance.

However, at the second 45 minutes of the debate, as the discussion moved into foreign policy issues, the contract started moving up, as Biden started giving more immediate answers, and Palin started avoiding questions and replied using stereotypical, canned answers.

What I found interesting was the significant increase in variance as the debate came close to the end. Prices fluctuated widely during the closing statements of the two VP candidates.

This increased volatility as the contract comes to a close, is actually a fact that we observed consistently in many contracts over time: when the contract is not close to 0.0 or 1.0, the price fluctuates widely as we get close to expiration. While I could explain this intuitively, I did not have a solid theoretical understanding of why.

So, what to do in this case? You simply ask a PhD student to explain it to you! I asked Nikolay Archak, and within a few weeks, Nikolay had the answer.

The basic result:

Volatility increases as contract price gets closer to 0.5,
Volatility decreases as contract price gets closer to 0.0 or to 1.0,
Volatility increases as we get close to the expiration, and approaches infinity if price is not 0.0 or 1.0.

More information about the basic ideas of the model and about the technical details in a later post.

Tuesday, September 30, 2008

Sarah Palin and Markov Models

How good are n-gram Markov models for language modeling?

Apparently pretty good for modeling the responses of Sarah Palin during her last couple of interviews! Check them out:

http://interviewpalin.com/
http://palinspeak.com/

Sunday, December 9, 2007

Political Prediction Markets: Some Thoughts

Apparently, my last postings on the predictability of the political prediction markets generated some interest. Niall O'Connor decided to check how accurate our predictions are, and after a few days he checked again to see how well we have done.

Our prediction that the price for Hillary Clinton will go down proved to be correct: the price declined from 70, on Dec 2nd (the time of the original post), to 63 on Dec 9th, a 10% decline. Similarly, for Mitt Romney we predicted a decline and the price declined from 24, on Dec 4th, to 18.5 on Dec 9th., a 23% decline.

For Guiliani, we said "The analysis is more difficult in this scenario, but for the next few days we see stabilizing signals with a trend to go upwards" and we were proven wrong: the price declined from 43 on Dec 2nd, to 39.5 on Dec 9th, an 8% decline. I realized what was wrong in my reasoning. What was stabilizing was the sentiment index, not the price. And a stabilized sentiment around 50% tends to be a pretty bad adviser on how the market will move.

The fact that it is possible to predict the prediction markets, bring automatically the question: "why?". What makes the markets predictable? The first answer is liquidity. The markets are not liquid, there are not enough participants, and there is a lot of momentum trading. However, I would like to list another explanation (already posted as a comment on Midas Oracle)

My understanding is that Betfair odds moved from 1.44 to 1.50 (according to the screenshot in the original posting). While indeed this corresponds to a drop from 69% to 66% (an almost 4% drop in share price) this is not as drastic as a drop from 69% to 50% within such a short period of time. Plus, the Betfair drop from 69% to 66% is comparable with the drop in Intrade (from 67% to 64%).

Also, I am not sure about the liquidity hypothesis for explaining the inefficiency. An alternative explanation is the following:

Political markets are not stock markets. They reflect the aggregate opinion of the traders about public's intention for the candidate. Notice that we have two levels of beliefs: one for what traders believe about the public's intentions, and a second for what the public actually intends to vote for.

Not every member of the voting public reads every piece of information. When the same news are being repeated over and over in the mainstream news outlets, then more voters are influenced. Hence, the longer the news about a candidate stay around, the longer the public gets influenced by the same, stale news and changes intentions. This is correspondingly reflected in the prediction markets, potentially in an efficient manner.

This may indicate that it is not that the markets are not efficient, but that the voting public is not "efficient" (i.e., voters do not incorporate all the available information in their voting decisions.)

We can test this hypothesis by testing the efficiency/predictability of political prediction markets vs. the efficiency/predictability of non-political markets.

We will work further with George Tziralis on the topic, and we will keep you posted.

Public commitment is always a good motivation to work harder :-)

Then, Bo Cowgill commented that indeed using text mining together with prediction markets is indeed a good idea. Bo's comment made me think about parallels in "prediction market trading" and "stock market trading". As Bo pointed out, in existing stock markets, there is a significant amount of algorithmic trading. This algorithmic trading makes the stock market significantly more efficient than, say, in the early 1980's where the programmatic trading was at its infancy. In fact, I have heard many stories from old-timers, saying that in the early days it was extremely easy to find inefficiencies in the markets and get healthy profits. As algorithmic trading proliferated, it became increasingly harder to spot inefficiencies in the market.

Something similar can happen today with prediction markets. If we have a prediction market platform that allows automatic/algorithmic trading, then we can improve tremendously the efficiency of today's prediction markets. Furthermore, such a tool (if done with play money) can be used as a great educational tool, similar to the now inactive Penn-Lehman Automated Trading (PLAT) Project. Allowing also for some data integration from the existing prediction markets (BetFair, Intrade, etc.) we could have a pretty realistic tool that can be used for many educational purposes that, at the same time, can generate useful and efficient prediction markets.

Now, I need to find someone willing to fund the idea. Ah, there are a couple of NSF call for proposals still open :-)

Tuesday, December 4, 2007

By Popular Demand: Mitt Romney

The last post generated some general interest and I got requests to post analysis for more presidential candidates. As one more data point, here is the market for Mitt Romney to be the Republican Presidential Nominee in 2008:

I checked again our sentiment indicator (in maroon), which seems to capture well the upward spikes. (If you see carefully, our indicator spikes upwards before the market.)

This market, similarly to the market of Hillary Clinton, seems to move in cycles. This cyclical behavior can be nicely visualized by plotting the 30-day moving average of our sentiment index (in black). It seems that a downward cycle has started for Romney and should should continue for the next couple of weeks, until the 30-day moving average gets close to 0.3 or so. Time will tell :-)

Sunday, December 2, 2007

Prediction Markets are NOT Efficient

I have been wondering in the past if prediction markets are efficient. Then, I was saying:

how long does it take for a prediction market to incorporate all the available information about an event? Liquidity seems to be an issue for the existing prediction markets, preventing them from reaching equilibrium quickly.

In fact, today's prediction markets are far from being efficient. Ari Gilder and Kevin Lerman, as part of an undegraduate project at University of Pennsylvania supervised by Fernando Pereira, have shown that by using linguistic analysis of news articles it is possible to predict the future price movements of the Iowa Electronic Markets. Therefore, the Iowa markets did not incorporate all the available information. Furthermore, the results indicated that it is possible to predict the price movement by simply using past pricing data. Therefore, the markets were not even weakly efficient. (Kevin is now a first year PhD student at Columbia University.)

One question was whether liquidity played a role in that result. The Iowa markets are thinly traded with upper limit on how much someone can bet. This imposes some artificial constraints making it difficult for information to flow freely into the market. Therefore, it is important to examine other markets with higher liquidity.

Over the last months we have been discussing this issue with George Tziralis, trying to examine how to evaluate the "Efficient Prediction Market" hypothesis. After long discussions, we came up with some techniques for extracting signals from the news about the prediction markets and see whether we can use these signals for predicting the future performance of markets in InTrade. Our sentiment indicator seems to work pretty well, even in liquid markets. Here is a preliminary result for the market on whether Hillary Clinton will be the Democratic Presidential Nominee in 2008:

Our sentiment index (in maroon) is close to 1 when we predict that the market will move higher, and it is close to 0 when we predict that the market will move down. Typically, it works pretty well for predicting long periods of price increases and declines. To put our money where our mouth is, the signal from the last few days shows that Hillary's market price will edge lower in the next few days/weeks.

The market prices for whether Giuliani will be the Republican Presidential Nominee in 2008, together with our sentiment index is displayed below.

The analysis is more difficult in this scenario, but for the next few days we see stabilizing signals with a trend to go upwards.

We will need to analyze quite a few more markets before generating the paper, but so far the results seem interesting.

Let's see what the future brings :-)