Monday, November 29, 2010

Wisdom of the Crowds: When do we need Independence?

I have been thinking lately about the conditions and assumptions for the wisdom of crowds to work. Surowiecki, in this popular book, gave the following four conditions for the crowd to arrive at
the correct decision.
  • Diversity of opinion: Each person should have private information even if it's just an eccentric interpretation of the known facts.
  • Independence: People's opinions aren't determined by the opinions of those around them.
  • Decentralization: People are able to specialize and draw on local knowledge.
  • Aggregation: Some mechanism exists for turning private judgments into a collective decision.
The part that got me mostly puzzled is the independence assumption. Actually, I can support pretty much any thesis. I can argue that independence is necessary. I can argue that we do not really need independence so much. And I can argue that independence is evil. And I will do all these things below.

Independence is necessary

It is not difficult to understand why, in some cases, independence is necessary. If the contributions from the crowd are not independent, then it we may easily observe a herding behavior. Daniel Tunkelang discusses a nice, instructional example (from the book Networks, Crowds, and Markets, by David Easley and Jon Kleinberg), in which the influence of the crowd can lead often to incorrect decisions, while independence can easily avoid erroneous outcomes.

The paper "Limits for the Precision and Value of Information from Dependent Sources" by Clemen and Winkler shows that in the presence of positive correlation, when we aggregate information from multiple dependent sources, the resulting accuracy does not increase as we would expect.

The figure below shows in the x-axis the number of dependent sources, and in the y-axis the equivalent number of independent sources, for various correlation coefficients ρ.

Even at moderate levels of ρ, we see how strong are the limitations. With ρ=0.4 it is almost impossible to go above two independent sources. And if we have noisy input, we often need a large number of independent sources to separate signal from noise.

In other words, it is better to have a couple of independent opinions, rather than having thousands of correlated voices.

Lack of independence: Perhaps not so bad

We have examples where lack of independence is not always bad.

For example, according to the paper "Measuring the Crowd Within" by Vul and Pashler, even asking the same person for a second time and getting the average can lead to improved outcomes.

Or take the other poster-child application of wisdom of crowds: prediction markets (or markets, in general). In these markets, people trade based on their personal information. However, they can always see (and get influenced?) by the aggregated opinion of the crowd, as this is reflected in the market prices. And empirical evidence illustrates that (prediction) markets work surprisingly well, despite (or because of) the lack of independence. Prior work has even demonstrated that even non-public information spreads quickly through the market (and the SEC checks for insider trading if they detect unusual activity before the public release of sensitive information.)

Wikipedia is another example: People do see what everyone else has done so far, before adding the extra information.

One paper that I found to be of interest is the Naïve Learning in Social Networks and the Wisdom of Crowds by Golub and Jackson. The authors address the following question: "for which social network structures will a society of agents who communicate and update naïvely come to aggregate decentralized information completely and correctly?". The results are based on the ideas of convergence for Markov Chains. One of the basic result says that the Pagerank-score of a node in the network defines the weight of the node's influence in the final outcome.

In all these cases, the participants get information from the crowd, they do not just follow blindly. So, there is some benefit in interacting.

Independence is bad

Going even further, we have cases where complete independence of participants is bad!

This typically happens when participants know only parts of the overall information. Through communication, it is possible to identify the complete picture, but lack of communication leads to suboptimal outcomes. Consider the example in Proposition 2 from the paper "We can't disagree forever" by Geanakoplos and Polemarchakis:
  • We have a 4-sided dice, with mutually exclusive outcomes A, B, C, and D, each one occurring with probability 0.25.
  • In reality, the dice rolled 1. But nobody knows that. Instead the knowledge of the players is:
    • Player 1 knows that the event "A or B" happened
    • Player 2 knows that the event "A or C" happened
  • Both players can bet on whether "A or D" happened.
So, look what happens
  • No independence: If player 1 can communicate directly with player 2, they can figure out that event A happened, and they are certain that "A or D" occurred with probability 1.0
  • Independence: If player 1 cannot communicate, then both players assign a probability of 0.5 to the event "A or D". This is despite the fact that they collectively own enough information to figure out that A happened, and there is a market to trade the event. In other words, the market fails to aggregate the available information.
So, we have a scenario where the inability to spread information actually results in a bad outcome. However, if we allowed the participants to be non-independent, we could have an improved outcome.

Influence vs Information Spread

So, we can see actual examples where spread of information (and hence, lack of independence) can be both good and bad. Lack of independence, can lead to groupthink: and the individual voices get drowned in a sea of correlated opinions. At the other extreme, lack of communication leads to suboptimal outcomes.

The paper by Plott and Sunder "Rational Expectations and the Aggregation of Diverse Information in Laboratory Security Markets" discusses the issue in the context of security markets and examines how market design affects the information aggregation properties of markets. (Thanks for David Pennock for the pointer.)

The paper by Ostrovsky "Information Aggregation in Dynamic Markets with Strategic Traders" (in EC'09, I think also forthcoming in Econometrica) provides a rigorous theoretical framework on what are the conditions for information to be aggregated in a market: essentially we have "separable" securities for which all the available information can be aggregated, and non-separable ones that do not have this property. However, I do not have the necessary background to fully understand and present the ideas in the paper. And I cannot see how to connect this with the literature of information spreading in social networks.

In a more intuitive sense, it seems that we need information to spread and not just influence.

Unfortunately, I cannot grasp the full picture, despite the fact that I tried to look the problem from different angles (Ironic, eh?).

I still not fully understand the implications of the above in the design of processes that involve human input. Does it make sense to show to people what other people have contributed so far? Will we see effects of anchoring? Or will we see the establishment of a common ground and get people to coordinate better and understand each other's input?

How can we quantify and put in a common framework all the above?