Saturday, December 5, 2009

Prisoner's Dilemma and Mechanical Turk

I have been reading lately, about the differences between mathematical models of behavior and real human behavior. So, I decided to try on Mechanical Turk the classical game theory model of Prisoner's Dilemma. (See also Brendan's nice explanations and diagrams if you have never been exposed to game theory before.)

From Wikipedia:

In its classical form, the prisoner's dilemma ("PD") is presented as follows:

Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal. If one testifies (defects from the other) for the prosecution against the other and the other remains silent (cooperates with the other), the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both remain silent, both prisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives a five-year sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?

If we assume that each player cares only about minimizing his or her own time in jail, then the prisoner's dilemma forms a non-zero-sum game in which two players may each cooperate with or defect from (betray) the other player. In this game, as in all game theory, the only concern of each individual player (prisoner) is maximizing his or her own payoff, without any concern for the other player's payoff. The unique equilibrium for this game is a Pareto-suboptimal solution, that is, rational choice leads the two players to both play defect, even though each player's individual reward would be greater if they both played cooperatively.

My first attempt was to post to Mechanical Turk this dilemma in a setting of the following game:

You are playing a game together with a stranger. Each of you have two choices to play: "trust" or "cheat".
  • If both of you play "trust", you win $30,000 each.
  • If both of you play "cheat", you get $10,000 each.
  • If one player plays "trust" and the other plays "cheat", then the player that played "cheat" gets $50,000 and the player that played "trust" gets $0.
You cannot communicate during the game, and CANNOT see the final action of the other player. Both actions will be revealed simultaneously.

What would you play? "Cheat" or "Trust"?

Basic game theory predicts that the participants will choose "cheat" resulting in a suboptimal equilibrium. However, participants on Mechanical Turk did not behave like that. Instead, 48 out of the 100 participants decided to play "trust", which is above the 33% observed in the lab experiments of (Shafir and Tversky, 1992).

Next, I wanted to make the experiment more realistic. Would anything change if instead of playing an imaginary game, I promised actual monetary benefits to the participants? So, I modified the game, and asked the participants to play against each other. Here is the revised task description.

You are playing a game against another Turker. Your action here will be matched with an action of another Mechanical Turk worker.

Each of you have two choices to play: "trust" or "cheat".
  • If both of you play "trust", you both get a bonus of $0.30.
  • If both of you play "cheat", you both get a bonus of $0.10.
  • If one Turker plays "trust" and the other plays "cheat", then the Turker that played "cheat" gets a bonus of $0.50 and the Turker that played "trust" gets nothing.
What is your action? "Cheat" or "Trust"?

I asked 120 participants to play the game, paying just 1 cent for the participation. Interestingly enough, I had a perfect split in the results. 60 Turkers decided to cheat, and 60 Turkers decided to cheat. The final result was 20 pairs of trust-trust, 20 pairs of cheat-cheat, and 20 pairs of cheat-trust.

In other words, the theory prediction that people will be locked in a non-optimal equilibrium was not correct, neither in the "imaginary" game, nor in the case where the workers had to gain some actually monetary benefit.

Finally, I decided to change the payoff matrix, and replicate the structure of the TV game show "Friend or Foe". There, participants get $50K each if they cooperate, $0 if they do not, and if one chooses trust and the other cheat, the "cheat" gets $100K and the "trust" gets $0.

You are playing a game together with a stranger.

Each of you have two choices to play: "trust" or "cheat".
  • If both of you play "trust", you both win $50,000.
  • If both of you play "cheat", you both get $0.
  • If one player plays "trust" and the other plays "cheat", then the player that played "cheat" gets $100,000 and the player that played "trust" gets $0.
You cannot communicate during the game, and CANNOT see the final action of the other player. Both actions will be revealed simultaneously.

What would you play? "Cheat" or "Trust"?

Interestingly enough, in this setting ALL 100 players ended up playing "trust", which was quite different from the previous game and from the behavior of the players in the TV show, where, in almost 25% of the played games, both players chose "cheat" ending up with $0, and in 25% of the games the players collaborated and played "trust" getting $50K each.

So, in my final attempt, I asked Turkers to play this "Friend of Foe" game, having monetary incentives. Here is the task that I posted on Mechanical Turk.

You are playing a game against another Turker. Your action here will be matched with an action of another Mechanical Turk worker.

Each of you have two choices to play: "trust" or "cheat".
  • If both of you play "trust", you both get a bonus of $0.50.
  • If both of you play "cheat", you both get $0.
  • If one Turker plays "trust" and the other plays "cheat", then the Turker that played "cheat" gets a bonus of $1.00 and the Turker that played "trust" gets nothing.
What is your action? "Cheat" or "Trust"?

In this game, 33% of the users decided to cheat, resulting in 6/50 games where both players got nothing, 23/50 games where both players got a 50 cent bonus, and 21/50 games where one player got $1 and the other player got nothing.

I found the difference in behavior between the imaginary game and the actual one to be pretty interesting. Also, the deviation from the predictions of the game-theoretic model is striking.

Although I am not the first to actually observe that, this deviation got me wondering: Why do we use elaborate game theory models for modeling user behavior, when not even the simplest such models do not correspond to reality? How can someone take seriously the concept of an equilibrium when a game, introduced in the intro chapter of every game theory textbook, simply does not correspond to reality? Do we really understand the limitations of our tools, or mathematical and analytic elegance end up being more important than reality?

12 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. From what you describe it just looks like that this particular theory applicability depends on amount of money involved. It does not mean it is invalid, it just means it have to be refined a bit by indexing decision probabilities by stakes amount.

    ReplyDelete
  3. I'd like to see you play this game with a punitive system rather than a reward system. I suspect people minimize downside differently than they optimizer upside.

    ReplyDelete
  4. Correct, psychology experiments indeed point in this direction. Potential loss is treated differently than potential gain.

    I will post the "imaginary game" scenario with losses instead of potential games. I do not have a good way to test this with "real" monetary loss on Mechanical Turk though.

    ReplyDelete
  5. It would be very interesting to explore the elasticity of the results given different rewards (would they be consistent?)
    Moreover, following the Transaction Utility Theory & Framing Effects you can explore the sensitivity to the construct (MTurk, lab experiments, TV game show).

    ReplyDelete
  6. "What would you do in situation X?" questions are notoriously unreliable when matched against what people actually do.

    As notbrainsurgery pointed out, size matters. All you need to account for it theoretically is a utility function that's non-linear in number of monetary units. (Maximizing expected gains is a much dicier modeling decision, as the issue of risk aversion brought up by 6p... stated.)

    Further, when monetary rewards are neglible (as one may argue they are in non-repeated mech Turk runs), people have little incentive to optimize, even if they could work out the math. This is why low-stakes poker's so boring -- everyone just stays in on every hand.

    Much of game theory is concerned about normative issues in the sense of what you should do to optimize a certain criterion. Not descriptive issues of what people actually do. It's been well known at least since Herb Simon's work in the 1950s that people don't have time to optimize in reality. And now behavioral econ's popular again, so even researchers care again about what actual people do.

    I don't believe there are many classical predictions for what people do when faced with the prisoner's dilemma for precisely the reason you outline above, namely that the equilibrium is not Pareto optimal. That's why it's interesting theoretically.

    Repeated and population versions of the game are also interesting, and could be run to some extent on mech Turk. Matching up players would require some coding.

    How about a loss for mechanical Turker where you ding their task completion score? With that disincentive, I doubt many folks will "cheat" unless you really up the monetary reward.

    ReplyDelete
  7. This is a really interesting and straightforward way to test these theories, and thank you for sharing the results.

    Do you have any plans of continuing this with other (similar) types of games?

    ReplyDelete
  8. @Jed: Yes, I will post more once the semester ends and I can work on some things that were not due yesterday :-)

    ReplyDelete
  9. Hi Panos
    I just wanted to let you know that I have been lurking and I have enjoyed your blog. My background in Social Work and I don't understand much of what you do, but I am really impressed with your citation tracker and I found your academic tree and this is really clever - I would love to hear more about how you did this research when you get some time.

    Hello from Australia!

    Helen

    ReplyDelete
  10. Pano

    I am sorry to say all these facts are pretty well known. The discussion why we need game theory is very old too.

    Very briefly, it is useful as a framework to discuss games, it is useful normatively, it is useful when players are

    a) rational
    b) profit maximising

    (say big companies)

    it is useful when the conditions are such that learning or evolutionary processes lead to equilibrium etc

    There is a whole field working on these things, bounded rationality and experimental economics. You might want to have a look around. Especially interesting are two models, level k and quantal response equilibrium, which I discuss for example in most of my papers using experiments and working on bounded rationality (esp. the first one and the fifth).

    ReplyDelete
  11. Thanks Sotiri for the pointers to the quantal response equilibrium. Seems pretty interesting.

    I did not have any illusion that what I say here is new. I just replicated some existing work using Mechanical Turk as my testbed. I found the behavior of individual users pretty interesting neverthless.

    ReplyDelete
  12. That's very interesting! The 2nd experiment's 60-60 breakdown suggest that people just did coin-flipping, however there is too few cheat-trust pairs and too many cheat-cheat + trust-trust pairs for. (If people were coin-flippers, then should be around 15, 15 and 30). Did people have information about each other?

    Akhmed.

    ReplyDelete