Extreme value theory 101, or Newsweek researching minimum wage on Mechanical Turk

Last week, Newsweek published an article titled The Real Minimum Wage. The authors report that "in a weeks-long experiment, we posted simple, hourlong jobs (listening to audio recordings and counting instances of a specific keyword) and continually lowered our offer until we found the absolute bottom price that multiple people would accept, and then complete the task."

The results "showed" that Americans are the ones willing to accept the lowest possible salary for working on a task, compared even to people in India, Romania, Philippines, etc. In fact, they found the that there are Americans willing to work for 25 cents per hour, while they could not find anyone willing to work for less than \$1/hr in any other country. The conclusion of the article? Americans are more desperate than anyone else in the world.

What is the key problem of this study? There are many more US-based workers on Mechanical Turk compared to other nationalities. So, if you have a handful of workers from other countries, and hundreds of workers from the US, you are guaranteed to find more extreme findings for the US. Why? To put it simply, you are searching harder within the US to find small values, compared to the effort placed on other countries. (There are other issues as well, e.g., workers that would work on this task are not necessarily representative of the overall population; the same workers are exposed to multiple, decreasing salaries, issues of anchoring, issues of workers falsely reporting to be from the US, whether the authors checked IP geo-location, etc. While all these are valid concerns, they are secondary to the very basic statistical problem.)

Finding a Minimum Value: A Probabilistic Approach

On an abstract, statistical level, by testing workers from multiple countries, to determine their minimum wage, we sample multiple "minimum wage distributions" trying to find the smallest value within each one of them.

Each probability distribution corresponds to the minimum wages that workers from different countries are willing to accept. Let's call the CDF's of distributions $F_i(x)$, with, say, $F_1(x)$ being the distribution for minimum wages for US, $F_2(x)$ for India, $F_3(x)$ for UK, etc etc.

As an simplifying example, assume that $F(x)$ is a uniform distribution, with minimum value \$0 and a maximum value \$10, for an average acceptable minimum wage of \$5. This means that:

10% of the population will accept a minimum wage below \$1, (i.e., $F(\$1)=0.1$)
20% of the population will accept a minimum wage below \$2, (i.e., $F(\$2)=0.2$)
...
90% of the population will accept a minimum wage below \$9, (i.e., $F(\$9)=0.9$)
100% of the population will accept a minimum wage below \$10, (i.e., $F(\$10)=1.0$)

Now, let's assume that we sample $n$ workers from one of the country-specific distributions. After running the experiment, we get back measurements $x_1, \ldots, x_n$, each one corresponding to the minimum wage for each of the workers that participated in the study, who comes from the country that we are measuring.

What is the probability of one of these wages being below, say, $z=\$0.25$? Here is the probability calculation:

$\begin{eqnarray}
Pr(\mathit{min~wage} < z) &=& 1 - Pr(\mathit{all~wages} \geq z)\\
& =& 1 - Pr(x_1 \geq z, \ldots, x_n \geq z)
\end{eqnarray}$

Assuming independence across the sampled values, we have:

$\begin{eqnarray}
Pr(\mathit{min~wage} < z) &=& 1 - \prod_{i=1}^n Pr(x_i \geq z) \\
& =& 1 - \left(1 - F(z) \right)^n
\end{eqnarray}$

So, if we sample $n$ workers, set the minimum wage at $z=0.25$ , and assume uniform distribution for $F$, then $F(\$0.25)=0.025$ and the probability that we will find at least one worker willing to work for 25 cents is:

$Pr(\mathit{min~wage} < z) = 1 - 0.975^n$

Plotting this, as a function of $n$, we have the following:

As we get more and more workers, the more likely it is to find a value that will be at or below 25 cents/hour.

So, how this approach explains the findings of Newsweek?

We know that all countries are not equally represented on Mechanical Turk. Most workers are from the US (50% or so), followed by India (35% or so), and then by Canada (2%), UK (2%), Philippines (2%), and a variety of other countries with similarly small percentages. This means that in the study, we expect to have more Americans participating, followed by Indians, and then a variety of other countries. So, even if the distribution of minimum wages was identical across all countries, we expect to find lower wages in the country with the largest number of participants.

Since the majority of the workers on Mechanical Turk are from US, followed by India, followed by Canada, and UK, etc, the illustration by Newsweek simply gives us the country of origin of the workers, in reverse order of popularity!

At this point, someone may ask: what happens if the distribution is not uniform but, say, lognormal? (A much more plausible distribution for minimum acceptable wages.) For this specific question, as you can see from the analysis above, this does not make much of a difference: The only thing that we need to know if the value of $F(z)$ for the $z$ value of interest.

Going in depth: Extreme Value Theory

A more general question is: What is the expected maximum (or minimum) value that we expect to find when we sample from an arbitrary distribution? This is the topic of extreme value theory, a field in statistics that tries to predict the probability of extreme events (e.g., what is the possible biggest possible drop in the stock market? what is the biggest rainfall in this region?) Given the events in the financial markets in 2008, this theory has received significant attention in the last few years.

What is nice about this theory is that the fundamentals can be summarized very succinctly. The Fisher–Tippett–Gnedenko theorem states that, if we sample from a distribution, the maximum values that we expect to find will be a random variable, belonging to one of the three distributions:

If the distribution from which we are sampling has a tail that decreases exponentially (e.g., normal distribution, exponential, Gamma, etc), then the maximum value is described by the (reversed) Gumbel distribution (aka "type I extreme value distribution")
If the distribution from which we are sampling has a tail that decreases as a polynomial (i.e., has a "long tail") (e.g., power-laws, Cauchy, Student-t, etc), then the maximum value is described by the Frechet distribution (aka "type II extreme value distribution")
If the distribution from which we are sampling has a tail that is finite (i.e., has a "short tail") (e.g., uniform, Beta, etc), then the maximum follows the (reversed) Weibull distribution (aka "type III extreme value distribution")

The three types of the distributions are all special cases of the generalized extreme value distribution.

This theory has significant applications not only when modeling risk (stock market, weather, earthquakes, etc), but also when modeling decision-making for humans: Often, we model humans as utility maximizers, who are making decisions that maximize their own well-being. This maximum-seeking behavior results often in the distributions described above. I will give a more detailed description in a later blog post.

A Computer Scientist in a Business School

Sunday, June 26, 2011

Extreme value theory 101, or Newsweek researching minimum wage on Mechanical Turk