Sunday, November 2, 2008

Computing State Correlations in Elections Using (Only) Prediction Markets

(The post below is largely based on the work of Nikolay. I am responsible for all the mistakes.)

One thing that always puzzled me was how to compute correlations across electoral results in the state level. In 2006, there was some discussion about the accuracy of the prediction markets in predicting the outcome of the senate races. From the Computational Complexity blog:
For example, the markets gave a probability of winning 60% for each of Virginia and Missouri and the democrats needed both to take the senate. If these races were independent events, the probability that the democrats take both is 36% or a 64% chance of GOP senate control assuming no other surprises.
However, everyone will agree that races across states are not independent events. For example, if Obama wins Georgia (currently trading at 0.36), the probability of winning Ohio will be higher than 0.70, the current price of the Ohio.DEM contract. (As we will see later in the post, the price of Ohio, given that Democrats win Georgia, is close to 0.81.)

So, what we would like to estimate is, for two states A and B, what is the probability $Pr\{A|B\}$ that a candidate wins state $A$, given that the candidate won state $B$.

One way to model these dependencies is to run conditional prediction markets but this leads to an explosion of possible contracts. Participation and liquidity is not great even in the current markets for state-level elections, there is little hope that combinatorial markets will attract significant interest.

Another way to compute these probabilities is to use and expand the model described in my previous post about modeling volatility in prediction markets. Let's see how to do that.

Expressing Conditional Contracts using Ability Processes

Following this model, for each state we have a "ability difference processes" $S(t)$ that tracks the difference in abilities of the two candidates. If at expiration time $T$, $S(T)$ is positive, the candidate wins the state; otherwise the candidate loses. So, we can write:

$$Pr\{A|B\} = Pr\{ S_A(T) \geq 0 | S_B(T) \geq 0 F(t) \}$$

where $F(t)$ is the information available at time $t$. Using Bayes rule:

$Pr\{A|B\} = \frac{Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}}{Pr\{ S_B(T)\geq 0 | F(t) \}}$

In the equation above, $Pr\{ S_B(T)\geq 0 | F(t) \}$ is simply the price of the contract for state $B$ at time $t$, i.e., $\pi_B(t)$.

Pricing Joint Contracts using Ito Diffusions

The challenging term is the price of the joint contract $Pr\{ S_A(T)\geq 0, S_B(T)\geq 0 | F(t) \}$.

To price this contract, we generalize the Brownian motion model, and we assume that the joint movement of $S_A(t)$ and $S_B(t)$ is a 2d Brownian motion. Of course, we do not want the 2d motion to be independent! Since the $S_A(t)$ and $S_B(t)$ represent the abilities, they are correlated! So, we assume that the two Brownian motions have some (unknown) correlation $\rho$. Intuitively, if they are perfectly correlated, when $S_A(t)$ goes up, then $S_B(t)$ goes up by the same amount. If they are not correlated, the movement of $S_A$ does not give any information about the movement of $S_B$, and in this case $Pr\{A|B\} = Pr\{A\}$

Without going into much details, in this model the price of the joint contract is:

$\pi_{AB}(t) = Pr\{ S_A(T)>0, S_B(T)>0 | F(t) \} = N_\rho ( N^{-1}(\pi_A(t)), N^{-1}(\pi_B(t)) )$

where $N_\rho$ is the CDF of the standard bivariate normal distribution with correlation $\rho$ and $N^{-1}$ is the inverse CDF of the standard normal. Intuitively, this is nothing more than the generalization of the result that we presented in the previous post.

However, the big question remains: How do we compute the value $\rho$?

Now the neat part: We can infer $\rho$ by observing the past time series of the two state-level contracts. Why is that?

First of all, we know that the price changes of the contracts are given by:

$d\pi(t) = V(\pi(t), t) dW$, which gives $dW = \frac{d\pi(t)}{ V( \pi(t), t)} $,

We can observe $d\pi(t)$ over time. We also know that $V( \pi(t), t) = \frac{1}{\sqrt{T-t}} \cdot \varphi( N^{-1}( \pi(t) ) )$ is the instantaneous volatility of the a contract trading at price $\pi(t)$ at time $t$.

So essentially we take the price differences over time and we normalize them by the expected volatility. This process generates the "normalized" changes in abilities, over time and across states.

Therefore, we can now use standard correlation measures of time series to infer the hidden correlation of the ability processes. (And then compute the conditional probability.) If the two ability processes were powered by independent Brownian motions $W_A$ and $W_B$, then $dW_A$ and $dW_B$ would not exhibit any correlation. If the two processes are correlated, then we can measure their cross-correlation by observing their past behavior.

Now, by definition of cross-correlation we get:

$\rho \approx \Sigma_{i=o}^t \frac{ (\pi_A(i+1) - \pi_A(i)) \cdot (\pi_B(i+1) - \pi_B(i)) }{ V(\pi_A(i), i) \cdot V(\pi_B(i), i) }$

Conditional Probabilities using InTrade

OK, if you stayed with me so long, here are some of the strong correlations as observed and computed based on the InTrade data. How to read the table? If Democrats win state B, what is the probability Pr(A|B) that they will also win state A? To make comparisons easy, we also list the current price of the contracts A and B. The "lift" shows how much the conditional probability increases compared to the base probability. I skipped the cases when a state has very high probability, i.e., above 0.9 (as they are either uninformative) or very low probability, i.e., less than 0.2 (as they are highly unlikely to happen). I also list only state pairs with lift larger than 1.10. You can also get the list as an Excel spreadsheet.