A Computer Scientist in a Business School

Saturday, December 5, 2009

Prisoner's Dilemma and Mechanical Turk

I have been reading lately, about the differences between mathematical models of behavior and real human behavior. So, I decided to try on Mechanical Turk the classical game theory model of Prisoner's Dilemma. (See also Brendan's nice explanations and diagrams if you have never been exposed to game theory before.)

From Wikipedia:

In its classical form, the prisoner's dilemma ("PD") is presented as follows:

Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal. If one testifies (defects from the other) for the prosecution against the other and the other remains silent (cooperates with the other), the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both remain silent, both prisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives a five-year sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?

If we assume that each player cares only about minimizing his or her own time in jail, then the prisoner's dilemma forms a non-zero-sum game in which two players may each cooperate with or defect from (betray) the other player. In this game, as in all game theory, the only concern of each individual player (prisoner) is maximizing his or her own payoff, without any concern for the other player's payoff. The unique equilibrium for this game is a Pareto-suboptimal solution, that is, rational choice leads the two players to both play defect, even though each player's individual reward would be greater if they both played cooperatively.

My first attempt was to post to Mechanical Turk this dilemma in a setting of the following game:

You are playing a game together with a stranger. Each of you have two choices to play: "trust" or "cheat".

If both of you play "trust", you win \$30,000 each.

If both of you play "cheat", you get \$10,000 each.

If one player plays "trust" and the other plays "cheat", then the player that played "cheat" gets \$50,000 and the player that played "trust" gets \$0.

You cannot communicate during the game, and CANNOT see the final action of the other player. Both actions will be revealed simultaneously.

What would you play? "Cheat" or "Trust"?

Basic game theory predicts that the participants will choose "cheat" resulting in a suboptimal equilibrium. However, participants on Mechanical Turk did not behave like that. Instead, 48 out of the 100 participants decided to play "trust", which is above the 33% observed in the lab experiments of (Shafir and Tversky, 1992).

Next, I wanted to make the experiment more realistic. Would anything change if instead of playing an imaginary game, I promised actual monetary benefits to the participants? So, I modified the game, and asked the participants to play against each other. Here is the revised task description.

You are playing a game against another Turker. Your action here will be matched with an action of another Mechanical Turk worker.

Each of you have two choices to play: "trust" or "cheat".

If both of you play "trust", you both get a bonus of \$0.30.

If both of you play "cheat", you both get a bonus of \$0.10.

If one Turker plays "trust" and the other plays "cheat", then the Turker that played "cheat" gets a bonus of \$0.50 and the Turker that played "trust" gets nothing.

What is your action? "Cheat" or "Trust"?

I asked 120 participants to play the game, paying just 1 cent for the participation. Interestingly enough, I had a perfect split in the results. 60 Turkers decided to cheat, and 60 Turkers decided to cheat. The final result was 20 pairs of trust-trust, 20 pairs of cheat-cheat, and 20 pairs of cheat-trust.

In other words, the theory prediction that people will be locked in a non-optimal equilibrium was not correct, neither in the "imaginary" game, nor in the case where the workers had to gain some actually monetary benefit.

Finally, I decided to change the payoff matrix, and replicate the structure of the TV game show "Friend or Foe". There, participants get \$50K each if they cooperate, \$0 if they do not, and if one chooses trust and the other cheat, the "cheat" gets \$100K and the "trust" gets \$0.

You are playing a game together with a stranger.

Each of you have two choices to play: "trust" or "cheat".

If both of you play "trust", you both win \$50,000.

If both of you play "cheat", you both get \$0.

If one player plays "trust" and the other plays "cheat", then the player that played "cheat" gets \$100,000 and the player that played "trust" gets \$0.

You cannot communicate during the game, and CANNOT see the final action of the other player. Both actions will be revealed simultaneously.

What would you play? "Cheat" or "Trust"?

Interestingly enough, in this setting ALL 100 players ended up playing "trust", which was quite different from the previous game and from the behavior of the players in the TV show, where, in almost 25% of the played games, both players chose "cheat" ending up with \$0, and in 25% of the games the players collaborated and played "trust" getting \$50K each.

So, in my final attempt, I asked Turkers to play this "Friend of Foe" game, having monetary incentives. Here is the task that I posted on Mechanical Turk.

You are playing a game against another Turker. Your action here will be matched with an action of another Mechanical Turk worker.

Each of you have two choices to play: "trust" or "cheat".

If both of you play "trust", you both get a bonus of \$0.50.

If both of you play "cheat", you both get \$0.

If one Turker plays "trust" and the other plays "cheat", then the Turker that played "cheat" gets a bonus of \$1.00 and the Turker that played "trust" gets nothing.

What is your action? "Cheat" or "Trust"?

In this game, 33% of the users decided to cheat, resulting in 6/50 games where both players got nothing, 23/50 games where both players got a 50 cent bonus, and 21/50 games where one player got $1 and the other player got nothing.

I found the difference in behavior between the imaginary game and the actual one to be pretty interesting. Also, the deviation from the predictions of the game-theoretic model is striking.

Although I am not the first to actually observe that, this deviation got me wondering: Why do we use elaborate game theory models for modeling user behavior, when not even the simplest such models do not correspond to reality? How can someone take seriously the concept of an equilibrium when a game, introduced in the intro chapter of every game theory textbook, simply does not correspond to reality? Do we really understand the limitations of our tools, or mathematical and analytic elegance end up being more important than reality?

Monday, November 30, 2009

Anchoring and Mechanical Turk

Over the last few days, I have been reading the book Predictably Irrational by Dan Ariely, which (predictably) describes many biases that we exhibit when making decisions. These biases are not just effects of random chance but are rather expected and predictable. Such biases and the "irrationality" of human agents is one of the focuses of behavioral economics; these biases have been also extensively studied in the field of cognitive psychology, which examines the ways that human agents process information.

One of the classic biases is the bias of "anchoring". Dan Ariely in his book shows how he got students to bid higher or lower for a particular bottle of wine: He asked students to write down the last digit of their social security number before placing the bid. As the anchoring theory postulated, students that wrote down a lower number, ended up bidding lower than students with a higher last digit in their SSN.

Why? Definitely not because the last digit revealed anything about their character. It was because the students got "anchored" to the value of the last digit they wrote down. I am certain that the experiment could be repeated by using the middle two digits as anchor, and the results would be similar.

Interestingly enough, at the same time that I was reading the book, I got contacted by Gabriele Paolacci, a PhD student in Italy. In his blog, Experimental Turk, Gabriele has been replicating some of these "classic" cognitive psychology experiments that illustrate these biases. As you might have guessed already, Gabriele has been using Mechanical Turk for these experiments. Gabriele tested the theory of anchoring using Amazon Mechanical Turk, replicating a study from a classic paper. In his own words:

We submitted the “african countries problem” from Tversky and Kahneman (1974) to 152 workers (61.2% women, mean age = 35.4). Participants were paid $0.05 for a HIT that comprised other unrelated brief tasks. Approximately half of the participants was asked the following question:

Do you think there are more or less than 65 African countries in the United Nations?

The other half was asked the following question:

Do you think there are more or less than 12 African countries in the United Nations?

Both groups were then asked to estimate the number of African countries in the United Nations.

As expected, participants exposed to the large anchor (65) provided higher estimates than participants exposed to the small anchor (12), F(1,150) = 55.99, p<.001. Therefore, we were able to replicate a classic anchoring effect - our participants’ judgments are biased toward the implicitly suggested reference points. It should be noted that means in our data (42.6 and 18.5 respectively) are very similar to those recently published by Stanovich and West (2008; 42.6 and 14.9 respectively).

References

Stanovich, K. E., West. R. F. (2008). On the relative independence of thinking biases and cognitive ability. Journal of Personality and Social Psychology, 94, 672-695.

Tversky, A., Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131.

Gabriele has more experiments posted in his blog, and I am looking forward for more experiments.

So, here is a question: Definitely we should take similar biases into consideration when collecting data from humans, and when conducting user studies. In a more general setting, can we use such biases more productively, in order to get users to complete tasks that are useful?

Ignore.. (Test)

Wednesday, November 18, 2009

Using the NYC Data Mine for an Intro Database Assignment

On October 6th, I was attending the New York Tech Meetup, and there I learned about the NYC Data Mine repository, which contains "many sets of public data produced by City agencies [...] available in a variety of machine-readable formats".

I went over the data sets available there and indeed the data sets were big, comprehensive, and (mostly) well-structured. So, I decided to use these data sets for the introductory database assignment for my "Information Technology in Business and Society" class. It is a core, required class at Stern and the students are mainly non-majors. Still, I wanted to see what they will do with the data.

So, I created an assignment, asking them to get two or more data sets, import them in a database and run some basic join queries to connect the data sets. Then, they had to bring the data into Excel and perform some PivotChart-based analysis. I left the topic intentionally open, just to see what type of questions they will ask.

Here are the results, together with my one-sentence summary of the analysis/results.

Socioeconomic Analysis of New York: An analysis of the ethnic composition and ancestry for each NYC borough.
Recycling rates across geographic and economic zones in New York City: Richer areas, recycle more.
Sidewalk Cafes and Electronics Stores: Disposable income and prevalence of sidewalk cafes and electronics stores.
Analysis of Crime, Graffiti, and "Broken Window Theory": Prevalence of graffiti is positively correlated with many other types of violent crime.
Recycling Your Expectations: How public satisfaction rate with recycling and public cleanliness correlate with that particular borough's actual recycling behaviors/habits/initiatives.
Borough graffiti vs. emergency response: Does a quicker response time from authorities mean less small crime?
Analyzing the correlation between marriage rates and population growth in NYC: Is there a correlation between marriage/divorce rate and the future expected growth rate of the population?
Analyzing Government's Use of Funds on Sports Parks in Relation to Public Demand: Does demand for a sport influence the city's decisions on fund allocation for various sports?
Analysis of the Number of High Schools per Borough in relation to Race and Income: Do wealthier boroughs have more and better high schools?
NYC Parks & Recreation Capital Budget Analysis: Which parks get most of the funding in New York?
Vandalism and demographics: Analysis of the relationship between unemployment rate, income level and population per police precinct with vandalism rate
NYC Events by Borough: The Relation between Event Types and Social, Demographic, and Economic Factor: What types of events take place in the different NYC boroughs?
Graffiti in New York City: Graffiti and socioeconomic factors.
War of the Boroughs: Projected population growth and correlation with socioeconomic factors.
Examining the socioeconomic composition of NYC by borough: Ethnic diversity, housing types, and educational attainment.
Recycling level compared with education level, income and poverty level by borough
Restaurant Code Violations Around New York City: Restaurants in St. Mark's Place are filthy. Sidewalk cafes are a safer bet.
The Great Discovery: Overcapacity trends in Manhattan Schools: NYC needs to build more schools in the northern part of the city.
Why Graffiti in NYC?: Graffiti and socio-economic factors.
Demographic Analysis of NYC: Relationship between fertility rate, education level and unemployment in the boroughs of New York City.
Where to Raise Your Children in NYC? What NYC borough would be the most ideal environment to raise children in.
Understanding the NYC Household: Analyzing Laundromats, Electronic Stores, and Schools: Analyzing relationships between consumption (electronic stores), necessities (laundromats), and education (schools)
Graffiti Incidents and Income Levels Among Residents Living in Brooklyn: Graffiti incidents in Brooklyn, broken down by community district board, and correlated with poverty and income levels.

Given that this was the first time that I was giving this assignment, and that this was the first time that students were learning about databases, I was pretty happy with the results. Most of them understood well the datasets and wrote meaningful queries against the data.

However, I would like to encourage the analysis of a more diverse set of data: Students seemed particularly attracted to the graffiti dataset and (expectedly) most used the data set with the socio-economic numbers of each borough.

The rather disappointing fact was that many teams took the "easy way out" and joined data based on the borough (Manhattan, Queens, Brooklyn, Bronx, Staten Island), while it would have been much more interesting to see joins based on zip codes, community boards, districts etc. I guess this becomes a requirement for next year.

Finally, I should encourage people to work with really big datasets (e.g., property valuation statistics), instead of the relatively small ones. But perhaps this is something reserved for the data mining class...

Friday, November 6, 2009

Utility of Money and the St. Petersburg Paradox

Consider the following game:

We will flip a fair coin, until a tail appears for the first time.

If the tail appears in the first throw, you win $2^1=2$ dollars.
If the tail appears in the second throw, you win $2^2=4$ dollars.
If the tail appears in the third throw, you win $2^3=8$ dollars.
...
If the tail appears in the $n$-throw, you win $2^n$ dollars.

What is the amount of money that someone should risk to enter this game? (This question works best when given to a person that claims to never play a lottery, roulette, or any gambling game, because the expected return is lower than the bet.)

Computing the expected return of this game, we have:

$E=\frac{1}{2}\cdot 2+\frac{1}{4}\cdot 4 + \frac{1}{8}\cdot 8 + \cdots = 1+1+1+ \cdots =\infty$

In other words, the expected utility is infinity, and a rational player should be willing to gamble an arbitrarily large amount of money to enter this game.

Can you find anyone willing to bet \$1,000 to play this game? Or \$10,000? Or even \$100? Yes, I did not think so.

This paradox is called the St. Petersburg Paradox, posed in 1713 by Nicholas Bernoulli and solved in 1738 by Daniel Bernoulli. Since then, a number of potential explanations appeared.

The most common approach is to use expected utility theory. In this case, we introduce a utility function $U(x)$, which describes the "satisfaction" that someone would get by having $x$ amount of money.

Utility of Money: The basic idea is that people do not bet based on the absolute amounts of the return but rather based on the utility of the award. The value of an additional \$100 when I have \$100 in the bank is much higher compared to the case when I have \$1,000,000 in the bank. This means that the "utility of money" function is a concave function of the available funds.

Just for demonstration, below you can see such a concave utility-of-money function that we have computed as part of a research project:

This concavity also partially explains the "risk aversion" that most people have: they prefer certainty over uncertainty. This means that they will reject even a reasonable bet with positive expected return. Why? Notice that the utility gained by winning is smaller than the decrease in utility that results from losing the bet. The higher the concavity, the higher the risk aversion.

If you want to read more about utility of money and its applications to portfolio management, insurance, and analysis of other cases, take a look at this book chapter.

So, next time that someone claims never to engage in any bet with a negative expected return, give the setting of the Bernoulli paradox with the positive expected return and observe the reactions...

Sunday, October 25, 2009

What is the (Real) Cost of Open Access?

After the transformation of Communications of ACM, I find myself increasingly interested in the articles that are published in CACM. As expected, one of the common ways to demonstrate my interest is by sharing the URL for the paper, on Twitter, on Facebook, on the blog, or by sharing the link with friends and colleagues. Unfortunately, CACM has a closed-access policy, effectively preventing anyone without a ACM membership or without a university account from actually reading the papers. Same thing for papers published in conferences and journals, but there I can typically find the paper in the home page of the author. For CACM, this is often not the case.

Needless to say, I hate closed access policies. While I can understand the shortsightedness of for-profit publishers, I fail to see why ACM has not adopted at least a "semi" Open Access model, making, say, the current issue of Communications of ACM available to the public. Or by giving public access to papers published 10 or 20 years back in the different journals and conferences.

The stated goal of the association is to promote the field. By restricting access, ACM simply does not work towards this goal!

The main argument that I hear is that publishing has some costs. But I am really trying to understand what are these costs. What is the magnitude of these costs? And who is being paid? Almost like the health-care debate, we are told that something is expensive but we have no idea of who ends up getting the money.

Let's examine the potential cost factors:

Printing: I understand that printing on paper has costs. But covering the the cost of printing seems easy: Amortize it across the print subscribers. (Or even abolish print versions.)

Servers for distribution: What is the cost of electronically distributing papers? The cost of running a server, should not be a concern. At the worst case, NSF should provide funds for that. I find it hard to think that NSF would turn down a request for funding a server that provides open access to scientific journals!

Submission handling: The cost of the submission website? I doubt that it is above $5K per year, per journal. Ask for a nominal submission fee (say $50 per paper) to cover this. The cost for the copy-editors? We can do much better without them, thank you. (Seriously, why do we still have copyeditors?)

Admin cost: The only cost that I can think of is the cost of the admin staff. But how much is it? I honestly have no idea! Is it so high that the ACM member subscriptions cannot cover the cost? I am trying to find the budget of ACM but I cannot find anything public.

Are there other hidden costs?

If anyone has pointers or extra information, please let me know. I am really trying to understand the real costs of high-quality electronic publishing.

Sunday, October 11, 2009

When Noise is Your Friend: Smoothed Analysis

Have you ever encountered the phrase "the algorithm has exponential running time, in the worst-case scenario, but in practice we observed it to be pretty efficient"? It is the phrase that divides theoreticians and practitioners. Many theoretical computer scientists focus on the analysis of the worst case complexity, generating often results that contradict practice.

For example, the simplex algorithm for linear programming is well known to be pretty efficient in practice. In theory, the worst-case complexity of simplex is exponential, classifying the simplex algorithm as a "non-efficient" algorithm. However, simplex has exponential running time only for very special cases. Most practitioners would even argue that you will never encounter such strange cases in practice. Only an adversary could potentially design such inputs.

Similarly, the Traveling Salesman Problem is a hallmark example of an NP-complete problem, i.e., unlikely to have an efficient algorithm anytime soon. However, there are many implementations of TSP that can provide almost optimal solutions for TSP, for pretty big inputs.

K-means is another such algorithm. It has a horrible worst-case scenario but ask the millions of people that use it for clustering. One of the most efficient clustering algorithms, despite its wost-case exponential complexity.

So, how can we reconcile theory and practice?

A very nice approach towards this reconciliation is the case of smoothed analysis. I first learned about this approach for analyzing algorithms by attending the (fascinating) job talk of Jon Kelner. Jon showed that if you pertubate a little bit the input before feeding it to the simplex algorithm, then it is almost impossible for the pertubed input to generate an exponential running time. In other words, by adding a little bit of noise in the data, there is the guarantee that we avoid the "tricky" parts of the input space.

What is the beauty of this approach? It explains why in many cases "inefficient" algorithms work well in practice: Most real data contain noise, and this noise can actually be beneficial! The other big lesson is that sometimes an algorithm ends up having a horrible worst-case performance just due to a small number of potential inputs, that are almost adversarial. Adding noise, may take care of these strange cases.

The last issue of Communications of ACM, has a great review article by Spielman and Teng on Smoothed Analysis. Explains the difference between worst-case, average-case, and smoothed analysis, and points to a wide variety of problems that have been analyzed using this technique. Highly recommended!

Monday, September 28, 2009

Rationality, P=NP, Prediction Markets, and a Paradox

I got the idea for this post after reading the post of Dick Lipton on betting on the P=NP problem. The discussion in the comments was extensive, mainly touching the issues of risk aversion, the inability of humans to estimate properly small probabilities, and so on. (Wolfers and Snowberg argue that it is due to the inability of human to understand very small probabilities.) The discussion continued in the Overcoming Bias blog and there one of the comments, being tongue-in-cheek, caught my eye:

probabilities must agree with logic on certainly-true and certainly-false statements, which means that the probability of logical truths has to be 1, and logical falsehoods have to be 0.

So,if P=NP is a decidable problem, it is either true or false. So, a fully rational agent, participating in the market, should know whether P=NP. It is not a matter of probabilities! All the information to make the decision is available. So, if the market has one or more rational players, the market should converge to a price of 0 or 1 immediately, depending on the state of the problem. Right?

So, which of the following is true?

There are no rational agents. So, all the analysis of prediction markets that assume rationality of traders is incomplete.
There are rational agents. The market does not converge to 0 or 1 because the P=?NP problem is undecidable.
There are rational agents but the return from the risk-free rate until reaching the time to settlement exceeds the return from the market. So, the market gives information on how long it will take for the problem to be officially solved.
If your laptop cannot find the solution, neither can the market.

OK, back to more serious work.

Monday, September 14, 2009

Citation Tracker: Monitoring Citations to your Publications

One of the common pastimes of academics is checking services such as Google Scholar to see the number of papers that cite our work. Quite often the statistics from Google Scholar, or from other services such as Web of Science, are used to create a citation report that is used for promotion and tenure purposes.

While Google Scholar is extremely valuable for finding papers that cite a particular piece of work, it has some shortcomings, especially when creating a citation report for promotion. First, Google Scholar does not differentiate between peer-reviewed (journal, conference, or workshop papers), and other publications (such as tech reports, or term papers); so, when preparing a citation report, I have to go over the list of papers, keeping the "legitimate" citations and removing the citations that are not admissible. Second, Google Scholar is noisy sometimes, and lists twice the same paper, or splits citations for the same paper into two different entries; some other times it does not include papers that are possible to find through a web search.

Another feature that I would really like to see is the ability to find the "new" citations for a given paper, creating the appropriate alerts. A simple RSS feed would work wonders, but it is not there.

Of course, Google Scholar also does not monitor the web to find other types of documents that may mention a particular paper. PhD seminars, or even blog posts, are things that I would like to keep track of when monitoring who cites my own work. Especially for such volatile pages, I typically want to keep a copy so that I can retrieve them a few years later, when compiling my promotion packet.

For this reason, over the summer, I created a tool that can augment Google Scholar and monitor Google Scholar (and other services like Libra, CiteSeerX, SSRN), and also monitor the Web (Google, Bing, Ask) for mentions of the paper.

You can access a pre-alpha version at http://www.citation-tracker.com

Some of the features:

Import publications from Google Scholar, DBLP, BibTeX, and manually.
Review the citations for each paper, and decide which ones to keep, which to discard, and which ones to examine later.
Monitor citation services (Google Scholar, Libra, CiteSeerX, SSRN) and see notifications when new citations to your papers appear.
Generate automatically a citation report, listing the papers that cite your work.

I have been using the service over the last few weeks and it seems reasonably stable. I import my papers using Google Scholar, "accept" the existing citations, and then wait to see about the new citations that pop up every now and then. I find it pretty useful for finding new papers that cite my work.

Over the last few days I even started importing papers from other researchers that I consider relevant to my work, and for which I want to see what new papers cite them.

Feel free to login and play with the system. Needless to say, it is an early release so I expect to see bugs here and there. If you see any bug, or if you would like to see a new feature, please add a note using the "feedback" tab that is visible on the side of the screen.

Enjoy!

Sunday, September 6, 2009

The different attitudes of computer scientists and economists

I was reading Noam Nisan's blog post about the different attitudes of computer scientists and economists. Noam hypothesizes that economists emphasize research on “what is” while computer scientists emphasize on “what can be”, and offers the view of an algorithmic game theorist.

I have my own interpretation on this topic, mainly from the data mining point of view.

Economists are interested in suggesting policies (i.e., suggest to people, "what to do"). Therefore, it is important to built models that assign causality. Computer scientists are rarely interested in the issue of causality. Computer scientists control the system (the computer) and algorithms can be directed to perform one way or another. In contrast, economists cannot really control the system that they study. They do not even know how the system behaves.

When a computer scientist proposes an algorithm, the main focus is to examine the performance of the algorithm under different settings of incoming data. How the (computer) system will behave is controlled. When an economist suggests a policy, it is highly unclear how the underlying (rational?) agents will behave. Therefore, it is important to figure out what exactly "causes" the behavior of the agents, and figure out what policies can change this behavior.

One area that gets closer to economics in this respect is the area of data mining and machine learning. Get the data, and learn how the underlying system behaves. For example, get data about credit card transactions and learn which of them are fraudulent. However, there is a significant difference in focus: Computer scientists are mainly focused on predictive modelling. As long as the system can "predict" the outcome on unseen data, things are ok. A black box with perfect predictive performance is great. Explanatory models are rarely the focus. In the best case, someone may want to understand the internals of the predictive model but even if the model can be understood (e.g., read the rules or the decision tree), these rules are rarely causal in nature.

Let me give you an example: Suppose that you are trying to predict price per square feet for houses. As one independent variable (feature) you add average size of the house in the area. What the predictive model will find? That places that have smaller houses also have higher price per square foot. Unexpected? Not really. Houses in urban areas are typically smaller and more expensive compared to the their suburban and rural counterparts.

For a predictive model, this information is absolutely sufficient; the average house size is a valuable feature for predictive purposes. Think however what would happen is someone was devising policy based on this feature. A house builder would try to build smaller houses in rural areas, hoping that the resulting prices would be higher. Or a politician in Manhattan would encourage construction of bigger apartments, since the experiments have shown that if average house size is increased, the prices will drop. Absurd? Yes.

Even funnier things can come up if someone uses country-wide data to predict demand for apartments using apartment prices. The result will show that increasing prices actually increases demand, even though we would expect the opposite. (Just the effect of prices increasing in places where there is higher demand.)

Predictive modeling can survive (or even thrive) by exploiting such strange correlations. A causal model that captures correlations and presents them as causes can wreak havoc.

So, an economist will try to build a model that will generate causal relationships. In the case above, a model based on supply and demand is more likely to result in a model that captures the true "causes" of increased apartment prices. A house builder can see these effects and make a more informed decision on how to build. Similarly, for a politician that is trying to encourage building more affordable housing.

Often, causal models are called "structural" in economics [not sure where the term comes from; I have seen a few different interpretations]. They typically start by modelling the micro-behavior of agents, and then proceed to explain the behavior of a large system comprising of the interactions of such agents. A benefit of such models is that assumptions are easier to check, test, and challenge. In contrasts to "statistical" models, such models tend to generate relationships that are easier to consider "causal".

An advantage of causal models over predictive models is that causal models are valid even if the underlying data distribution changes. Causal models are supposed to be robust, as long as the behavior of the agents remains the same. A predictive model works under the assumption that the "unseen" data follow the same distribution as the "training" data. Change the distribution of the unseen data, and any performance guarantee for the predictive models disappears.

Update 1: This is not an attempt to downgrade the importance of predictive models. Most of the results presented by Google after a query are generated using predictive modeling algorithms. You get recommendations from Amazon and Netflix as the outcome of predictive algorithms. Your inbox remains spam-free due to the existence of the spam filter, again a system built using predictive modeling techniques. It is too hard, if not impossible, to build "causal" models for these applications.

Update 2: An interesting example of a company deriving policy based on their predictive model is American Express. They realized that the feature "customer buys in a 99c store" is correlated with higher delinquency rates. So, AmEx decided to decrease the credit limit for such customers. Of course, the result will be that potentially affected customers will stop visiting such stores, decreasing the value of this policy for AmEx. Furthermore, this action may cause even more economic stress to these customers that are now "forced" to buy from more expensive stores, and this may result in a much higher default rate for AmEx. This "unexpected" outcome is the effect of devising policy based on non-causal variables.

If AmEx had a variable "customer in economic distress", which arguably has a causal effect on default rates, then it would be possible to perform this action, without the ability of customers to game the system. However, since AmEx relied on a variable "customer buys in a 99c store" that is the outcome of the variable "customer in economic distress" it is possible for consumers to simply change their behavior in the face of economic distress.

Thursday, August 27, 2009

Workshop on Information in Networks (WIN)

For those of you interested in the study of networked data, I would like to bring your attention to the "Workshop on Information in Networks (WIN)", a workshop organized by my colleagues Sinan Aral, Foster Provost, and Arun Sundararajan. It will take place on September 25-26, 2009. From the description:

The purpose of WIN is to bring together leading researchers studying ‘information in networks’ – its distribution, its diffusion, its value, and its influence on social and economic outcomes – in order to lay the foundation for ongoing relationships and to build a lasting multidisciplinary research community.

I should emphasize that the phrase "bring together leading researchers" is not a the standard template used in many call for papers. The lineup of speakers is truly outstanding! I would be very hard pressed to find any conference that would have such a lineup of invited speakers:

Lada Adamic, University of Michigan
Albert-Laszlo Barabasi, University of Notre Dame, Northeastern University
Ronald Burt, University of Chicago
Damon Centola, MIT
Pedro Domingos , University of Washington
Christos Faloutsos, Carnegie Mellon
James Fowler, University of California, San Diego
Sanjeev Goyal, University of Cambridge
Bernardo Huberman, HP Labs
Matthew Jackson, Stanford University
Michael Kearns, University of Pennsylvania
Jon Kleinberg, Cornell University
Rachel Kranton, Duke University
David Lazer, Harvard University
Jure Leskovec, Stanford
Michael Macy, Cornell University
Alex (Sandy) Pentland, MIT
Duncan Watts, Yahoo! Research

It is really as good as it gets. If you are interested in networked data and can be in New York on September 25-26, then this is an event that you must attend!

Tuesday, August 11, 2009

Get a Consent Form (for IRB) on MTurk using Qualification Tests

I was browsing through the various qualification tests on Mechanical Turk, checking what requesters ask and how they structure the tests. The one test that caught my eye was designed by Daniel Velleman and David Beaver from the Linguistics department of The University of Texas at Austin.

Here is the test:

"Which sentence do you prefer?" eligibility form
This qualification will allow you to participate in our English language research HIT, "Which sentence do you prefer?"
Is English your first language?
Yes
No
Do you (or did you) have at least one parent or caregiver
whose first language was English?
Yes
No

Please read this information

You are invited to participate in a survey, entitled "Which sentence do you prefer?" The study is being conducted by Daniel Velleman and David Beaver in the Linguistics department of The University of Texas at Austin.

Calhoun 501
1 University Station B5100
Austin, TX 78712-0198
(512) 471-1701

The purpose of this study is to examine English speakers' preferences about the order in which written information is presented. Your participation in the survey will contribute to a better understanding of the English language. We estimate that it will take about a minute of your time to complete each question. You are free to contact the investigator at the above address and phone number to discuss the survey.

Risks to participants are considered minimal. There will be no costs for participating. You will be paid for each HIT you complete, but will not otherwise benefit from participating. Your Amazon account information will be kept while we collect data for tracking purposes only. A limited number of research team members will have access to the data during data collection. This information will be stripped from the final dataset.

Your participation in this survey is voluntary. You may decline to answer any question and you have the right to withdraw from participation at any time without penalty. If you wish to withdraw from the study or have any questions, contact the investigator listed above.

If you have any questions, please email Daniel Velleman at ut.linguistics.mturk@gmail.com. You may also request a hard copy of the survey from the contact information above.

This study has been reviewed and approved by The University of Texas at Austin Institutional Review Board. If you have questions about your rights as a study participant, or are dissatisfied at any time with any aspect of this study, you may contact - anonymously, if you wish - the Institutional Review Board by phone at (512) 471-8871 or email at orsc@uts.cc.utexas.edu.

IRB Approval Number: 2009-03-0123
I understand want to participate in this study.

It is indeed a very clever idea to leverage a qualification test, to get workers to fill-in a consent form, and satisfy at the same time the requirement of the Institutional Review Board.

Perhaps the trick will be useful to other researchers that want to run human studies on Mechanical Turk. (I still believe that for this study an IRB is not required, but this is not the point of this post.)

Wednesday, August 5, 2009

Top Requesters on Mechanical Turk

Today I had a chat with Dahn Tamir about all things MTurk. He was particularly interested in the archive of all requesters that I have collected over the last 7 months. So, I queried the database, computed some basic statistics and sent him the results.

Then I thought: why not exporting the live results as well? A few php lines later, the leaderboard with the top Mechanical Turk requesters was born and is now available at http://mturk-tracker.com/top_requesters/

You can see for each requester the total number of projects they have posted on Mechanical Turk since January 2009, the total number of HITs, and the total value of the posted HITs. If you are also interested in whether the requester is still active, you can see when was the last time that they posted a HIT.

By clicking on their names, you can see the archive of the last 100 tasks that they have posted and by clicking at the requesterid you get to Amazon and you can see the tasks that are available now.

Enjoy!

Tuesday, August 4, 2009

When to Post Tasks on Mechanical Turk?

People that have experience with Mechanical Turk know that getting long tasks done on Mechanical Turk is tricky. While it is relatively easy to get small tasks done quickly, it is much more difficult to estimate how long a big task will take. The "estimated time" given by the Mechanical Turk interface is really crappy and provides pretty much no guidance if you expect your task to last longer than a day.

Naturally questions like this arise: When is it best to post a task? How can I minimize my waiting time?

Trying to understand better how tasks are being completed on Mechanical Turk, I started crawling Mechanical Turk every few minutes collecting data about the HITs, the requesters, how long each HIT is available and so on.

Queue

The first outcome of this effort was the Mechanical Turk Monitor, a visualization tool that shows how many projects are available at any given time, how many HITs, and the available rewards (see the old post).

Arrival process and Serving process

This tool was effectively showing the size of the "queue". However, it did not reveal neither how many tasks arrive per day on MTurk, nor how much work gets done on MTurk every day. So, last week I decided to display this information, and show the activity of the requesters and the corresponding activity of Turkers every day:

Activity of requestors: How many tasks are posted per day
Activity of Turkers: How many tasks are completed per day

Posting Activity

Now, we can start scratching the surface on how things get done on Mechanical Turk. A first pass is to see the statistics for what is being posted over time:

The x-axis depicts time, and the y-axis is the value of the HITs being posted every day. The blue line depicts the total value of the HITs being posted. One immediate observation is that there is some significant periodicity. Taking the 7-day average (red line) smooths significantly the curve. This indicates that there is some strong weekly periodicity.

Let's take a look at the distribution of posting activity over the days of the week:

The plot shows the distribution of the activity for every day of the week. By activity, we define the total value of HITs being posted on each day. As we can see, weekends tend to be significantly more quiet than weekdays. In fact, even Mondays tend to be relatively quiet, perhaps because requesters prepare their HITs that are then being posted on Tuesdays :-)

Well, the plot is not very surprising. Lots of activity during the workdays, less activity over the weekends.

Workers Activity

The interesting result though comes when we look at the activity of the Turkers:

It seems that Turkers are not in sync with the requesters. In fact, the activity on Saturdays us comparable to the activity during the weekdays. Surprisingly, Mondays tend to see significantly less activity. (Perhaps due to the small number of tasks being posted over the weekend?).

Conclusion

What is clear is that there is a relative lag between the activity of requesters and workers. Although it is hard to figure out causality from these figures, it seems that Fridays and Saturdays are good days to post tasks on Mechanical Turk. Relatively low competition for the attention of workers, and significant level of workers activity during Saturday.

So, now you know: Post your HITs on Friday and go away...

Friday, July 31, 2009

Workshops: Official or Unofficial Proceedings?

In the process of organizing DBRank 2010, we had to answer the following question: Should the proceedings for the workshop be "official" or "unofficial"?

Official workshop proceedings are undergoing the same process as the conference papers: Specific camera-ready format, submission by a given date to the proceedings chair, and then are officially hosted at the digital library of the publisher, with all the metadata, digital identifiers (DOI), and so on. (For DBRank 2010, that would be IEEE Xplore.) For buraucratic purposes, these papers are considered "publications."

Unofficial proceedings are, well, unofficial. Typically the workshop chair posts the papers up to the website, and potentially brings printed copies for distribution at the workshop. There is no official publisher, there is no DOI assigned to the papers, and in principle this is not more of a publication than a paper posted to a website.

So, should workshops have official or unofficial proceedings?

There are some arguments aganst official proceedings:

Increasingly, there is a significant conflict between workshop and conference publications. With some workshops allowing 8- or even 10-page workshop papers, it becomes hard for the authors of these papers to publish the same work in a conference, as there is typically significant overlap. Most database conferences will consider any past paper that is 3 pages or longer, to be a prior publication, and the conference version should have significant new content in order to be considered a "new" paper.
As conference become increasingly competitive many authors submit to workshops papers that could not "make it" to a conference. A workshop is typically easier to get into, and at the end "you get a paper" out of it. Needless to say, this pretty much violates the spirit of workshops that are supposed to be places for new, relatively immature research, not an archival publication.

On the other hand, there are advantages in having official proceedings:

It makes the workshop more attractive in the eyes of many authors. Authors get an official timestamp for their work and can point to a paper that has at least been lightly refereed, instead of pointing to a technical report or working paper.
It makes it easier for someone to locate the papers that were presented in the workshop. The websites for the workshops are not always hosted in "stable" websites and they disappear for various reasons. (For example, the websites for WebDB'99, WebDB 2000, WebDB 2001, and WebDB 2003 are not available any more, because the organizers have moved to different institutions.)

So, what to do? Official or unofficial?

Thursday, July 30, 2009

Is Amazon Mechanical Turk a black market?

According to Wikipedia, a black market is: "a market where all commerce is conducted without regard to taxation, law or regulations of trade". How is this related to Mechanical Turk?

Today, I received an email, asking about the tax and employment issues regarding Amazon Mechanical Turk. What are the rules about posting tasks on Mechanical Turk? How should these tasks be handled by accounting and human resources departments?

Unfortunately, Amazon did not design Mechanical Turk in a requester-friendly way. In an effort to relieve their accounting and HR department from a big overhead, Amazon transferred to the requesters the risk of violating the US Tax Code and engaging into illegal employment activities.

How can this happen? The key issue is whether there is an employer-employee relationship between the requesters and the workers on Mechanical Turk. The crucial question is:

When you submit funds to your Mechanical Turk account, who are you paying? Amazon.com or the worker?

If it is Amazon, then you are simply letting Amazon deal with all the tax and employment issues associated with the worker: Amazon needs to verify that the worker is eligible for employment, takes care of tax issues, and so on. In this case, hiring someone for a micro-task on Amazon is the same as getting an agency to provide cleaning services to your home: you do not need to care if the person coming to clean your place is eligible for employment, whether the taxes are properly withheld from the paycheck and so on. It is the agency's task to take care of that.

However, Amazon does not follow this route. According to the terms and conditions, paragraph 6.a:

In addition to the disclosures described in our Privacy Notice, we will disclose to Requesters [....] Provider Tax Information. "Provider Tax Information" means tax identification information of Providers, such as a Social Security Number or Employer Identification Number. Requesters use Provider Tax Information to fill out an IRS Form 1099 and send it to Providers. If you are a Requester and want Provider Tax Information from us to complete IRS Form 1099s for Providers you have paid, you must provide us with your company name and employer identification number ("Requester Tax Information"). You hereby consent to disclosure of Provider Tax Information, Requester Tax Information, and other data as described in this Section 6 and our Privacy Notice.

This provision is there because a requester that paid a worker more than 600USD per year, is required to submit 1099-MISC tax forms to these workers. In other words, this tiny provision means that the employer-employee relationship is not between Amazon and the worker but between the requester and the worker. This is in contrast to other marketplaces (e.g., Rent-A-Coder), where the requester pays the marketplace provider, and then the marketplace provider contracts individually the workers, taking care of tax issues, issues of employment authorization and so on.

What are the implications of this policy?

Requesters may be open to the risk of violating employment laws. It is possible that a requester is illegally employing US-based workers that do not have the right to work in the US.
Requesters may be open to the risk of violating the US tax code. The requester needs to keep track of how much they paid each individual worker (out of potentially thousands of workers), and send 1099-MISC tax forms to the workers that did more than 600USD worth of HITs over the year for the requester.

OK, these are the risks. What are the potential counter arguments and how can somene avoid these issues?

The employment-eligibility issue: Amazon pays in cash only people that have US bank accounts. This means that the person, if US-based, is legally in the US. I do not know if Amazon checks for employment eligibility (they should). If the person is not US-based, then Amazon pays through gift cards: From what I know, gift cards are not considered compensation, as we regularly give gift cards as awards to students, without worrying about their eligibility to work, and our accounting department never worried about this practice. So, the issue of illegal employment seems to be rather controlled but it would be nice if Amazon took explictly care of that. Yes, it is a big headache for the HR department of Amazon to handle thousands of micro-contractors, but this is the price to pay for running this service.

The tax issue: At the very least, Amazon should have an automatic service to take care of this issue rather than leaving requesters scramble to track all the micro-payments and send the paperwork. It is trivial: If a given requester-worker pair generated more than 600USD worth of HITs over the year, request tax information and send the 1099-MISC forms on their behalf.

A better solution: Request tax and employment-eligibility information from workers BEFORE they can work on the MTurk marketplace. Also, request tax information from all the requesters BEFORE they can post any tasks on MTurk. Then submit the tax forms automatically at the end of the year.

An even better solution: Adopt the Rent-A-Coder model, and consider the MTurk workers as Amazon contractors. Then requesters buy services from Amazon, in the same way they buy computing power on EC2, storage on S3, and so on. In this case, it is very simple to add the MTurk expense under the "software services" line in the accounting report.

Tuesday, July 14, 2009

How Prices Evolve in Prediction Markets?

Last week, Nikolay presented our paper on "Modeling Volatility in Predictions Markets" at the EC'09 conference. One of the questions that we are answering in this paper is, "what is the most likely price of a prediction market contract at some point in the future?"

Let's start with the expected price. If we assume that the markets are efficient, then the current price of the contract is the best possible estimate for the future expected price. However, the current price is NOT the most likely price in the future. In fact the probability of the contract will have the same price in the future is decreasing with time. Why? Because the final price of the contract as we get closer to the expiration will get closer to 0 or 1, as the uncertainty about the outcome decreases over time. So, while the expected price will be equal to the current price, most of the future prices will be closer to 0 and 1.

Below you can see some 3d plots of the "future price density" as a function of the future price P and the time to expiration t. We assume that "now" is t=0 and the contract expires at t=1.

If the current price is 0.5, then the future price density, as a function of the future price P and the time to expiration t, is:

As you can see, the possible prices, when we are close to t=0, are clustered around the current price (in this case 0.5). Then, as we move closer to the expiration, the probability density moves closer to 0 and 1. As this contract had price 0.5, the plot is completely symmetric around the axis P=0.5.

If we have a current contract price at 0.4, then the density becomes more skewed towards 0:

And here is an even more skewed plot, with the current contract price at 0.25:

Just in case you want to create your own plots, here is the Maple code:

with(stats);

normpdf:=(x,mu,sigma)->statevalf[pdf,normald[mu,sigma]](x);

spdf:=x -> normpdf(x,0,1);

normicdf:=(p,mu,sigma)->statevalf[icdf,normald[mu,sigma]](p);

sicdf:=x->normicdf(x,0,1);

f:= (pnow,pfuture,lambda) -> spdf ( sqrt(1/lambda) * sicdf(pnow) - sqrt(1/lambda-1) * sicdf(pfuture))*sqrt(1/lambda-1)/spdf(sicdf(pfuture));

plot3d(eval((f(p,P,t)), {p=0.5}), P=0..1, t=0.1..0.75, axes=boxed, shading=zhue, orientation=[-120, 50]);

So, what can we do with these results? One application is to price the X contracts on Intrade: In the "X" contracts, the question is about the future price movement of a prediction market contract (e.g., "will the contract for Democrats winning the 2012 election be above 0.75 on December 31st, 2010?").

These X contracts are similar to the existing "call" and "put" options on the stock market, where people try to guess where the price of a stock will be in the future. There is a significant difference, though: When a trader prices and trades a call/put option for a share, (e.g., using the vanilla Black-Scholes formula) the trader needs to guess the future volatility of the share price. Through this process, the trade gives to the public valuable information about the future volatility of share price. For prediction markets, trading an X contract does not reveal the same information. Our work shows what the exact form of future price distributions, without the need to provide any volatility estimates. (Volatility can be largely determined by the current price and time to expiration; see the past blog post and the EC'09 paper for details.) So, pricing an X contract requires just to plug in the current price, time to expiration, and strike price (information that is already public) to find the "correct" price for the X contract.

So, am I saying that the X contracts are completely useless? No. But the information revealed by trading these contracts is significantly less compared to the information revealed by trading options on stocks.

Saturday, July 4, 2009

Books, Journals, Conferences, Blogs

I was reading the overview on Open Access Overview by Peter Suber, and I ran into the following paragraph:

Scholarly journals do not pay authors for their articles, and have not done so since the first journals were launched in London and Paris in 1665. Journals took off because they were more timely than books. For readers, journals were better than books for learning quickly about the recent work of others, and for authors they were better than books for sharing new work quickly with the wider world and, above all, for establishing priority over other scientists working on the same problem. They gave authors the benefit of a fast, public time-stamp on their work. Because authors were rewarded in these strong, intangible ways, they accepted the fact that journals couldn't afford to pay them. Over time, journal revenue grew but authors continued in the tradition of writing articles for impact, not for money.

It was amusing to see that there was this transition from books to journals, for pretty much the same reason that in computer science we have seen a transition from journals to conferences. I am wondering if the senior scholars of the day were commenting on this transition in the same way that Mike Trick commented on the similar tension between journal and conference publications:

if a subgroup chooses a method of evaluation antithetical to the mores of the rest of academe, don’t be surprised if the group gets little respect outside their narrow group

So may be a few years from now, we will see a similar problem as people will start leaving "traditional" peer-reviewing behind, opting for new modes of publication, such as self-publishing. Michael Nielsen has an excellent article on the disruption of scientific publishing. Micheal points to the high quality blog posts from high-quality researchers:

Look at Terry Tao’s wonderful series of posts explaining one of the biggest breakthroughs in recent mathematical history, the proof of the Poincare conjecture. Or Tim Gowers recent experiment in “massively collaborative mathematics”, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s excellent series of posts exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers.

So, does the future of publication rely on self-publishing? Daniel Lemire may be right saying:

To me, the single most important recent event in academic publishing has been the publication by Perelman of his solution to the Poincarré conjecture on arxiv. This is truly a historical event.

Will this change alter fundamentally the way academia works? I do not think so. It will simply mean that every scholar will be very careful about the quality of the work that is self-published. When everyone can speak, people will only listen to those that generate content of high quality, effectively ignoring those that publish for the sake of publishing.