Monday, November 29, 2010

Wisdom of the Crowds: When do we need Independence?

I have been thinking lately about the conditions and assumptions for the wisdom of crowds to work. Surowiecki, in this popular book, gave the following four conditions for the crowd to arrive at
the correct decision.
  • Diversity of opinion: Each person should have private information even if it's just an eccentric interpretation of the known facts.
  • Independence: People's opinions aren't determined by the opinions of those around them.
  • Decentralization: People are able to specialize and draw on local knowledge.
  • Aggregation: Some mechanism exists for turning private judgments into a collective decision.
The part that got me mostly puzzled is the independence assumption. Actually, I can support pretty much any thesis. I can argue that independence is necessary. I can argue that we do not really need independence so much. And I can argue that independence is evil. And I will do all these things below.

Independence is necessary

It is not difficult to understand why, in some cases, independence is necessary. If the contributions from the crowd are not independent, then it we may easily observe a herding behavior. Daniel Tunkelang discusses a nice, instructional example (from the book Networks, Crowds, and Markets, by David Easley and Jon Kleinberg), in which the influence of the crowd can lead often to incorrect decisions, while independence can easily avoid erroneous outcomes.

The paper "Limits for the Precision and Value of Information from Dependent Sources" by Clemen and Winkler shows that in the presence of positive correlation, when we aggregate information from multiple dependent sources, the resulting accuracy does not increase as we would expect.

The figure below shows in the x-axis the number of dependent sources, and in the y-axis the equivalent number of independent sources, for various correlation coefficients ρ.

Even at moderate levels of ρ, we see how strong are the limitations. With ρ=0.4 it is almost impossible to go above two independent sources. And if we have noisy input, we often need a large number of independent sources to separate signal from noise.

In other words, it is better to have a couple of independent opinions, rather than having thousands of correlated voices.

Lack of independence: Perhaps not so bad

We have examples where lack of independence is not always bad.

For example, according to the paper "Measuring the Crowd Within" by Vul and Pashler, even asking the same person for a second time and getting the average can lead to improved outcomes.

Or take the other poster-child application of wisdom of crowds: prediction markets (or markets, in general). In these markets, people trade based on their personal information. However, they can always see (and get influenced?) by the aggregated opinion of the crowd, as this is reflected in the market prices. And empirical evidence illustrates that (prediction) markets work surprisingly well, despite (or because of) the lack of independence. Prior work has even demonstrated that even non-public information spreads quickly through the market (and the SEC checks for insider trading if they detect unusual activity before the public release of sensitive information.)

Wikipedia is another example: People do see what everyone else has done so far, before adding the extra information.

One paper that I found to be of interest is the Naïve Learning in Social Networks and the Wisdom of Crowds by Golub and Jackson. The authors address the following question: "for which social network structures will a society of agents who communicate and update naïvely come to aggregate decentralized information completely and correctly?". The results are based on the ideas of convergence for Markov Chains. One of the basic result says that the Pagerank-score of a node in the network defines the weight of the node's influence in the final outcome.

In all these cases, the participants get information from the crowd, they do not just follow blindly. So, there is some benefit in interacting.

Independence is bad

Going even further, we have cases where complete independence of participants is bad!

This typically happens when participants know only parts of the overall information. Through communication, it is possible to identify the complete picture, but lack of communication leads to suboptimal outcomes. Consider the example in Proposition 2 from the paper "We can't disagree forever" by Geanakoplos and Polemarchakis:
  • We have a 4-sided dice, with mutually exclusive outcomes A, B, C, and D, each one occurring with probability 0.25.
  • In reality, the dice rolled 1. But nobody knows that. Instead the knowledge of the players is:
    • Player 1 knows that the event "A or B" happened
    • Player 2 knows that the event "A or C" happened
  • Both players can bet on whether "A or D" happened.
So, look what happens
  • No independence: If player 1 can communicate directly with player 2, they can figure out that event A happened, and they are certain that "A or D" occurred with probability 1.0
  • Independence: If player 1 cannot communicate, then both players assign a probability of 0.5 to the event "A or D". This is despite the fact that they collectively own enough information to figure out that A happened, and there is a market to trade the event. In other words, the market fails to aggregate the available information.
So, we have a scenario where the inability to spread information actually results in a bad outcome. However, if we allowed the participants to be non-independent, we could have an improved outcome.

Influence vs Information Spread

So, we can see actual examples where spread of information (and hence, lack of independence) can be both good and bad. Lack of independence, can lead to groupthink: and the individual voices get drowned in a sea of correlated opinions. At the other extreme, lack of communication leads to suboptimal outcomes.

The paper by Plott and Sunder "Rational Expectations and the Aggregation of Diverse Information in Laboratory Security Markets" discusses the issue in the context of security markets and examines how market design affects the information aggregation properties of markets. (Thanks for David Pennock for the pointer.)

The paper by Ostrovsky "Information Aggregation in Dynamic Markets with Strategic Traders" (in EC'09, I think also forthcoming in Econometrica) provides a rigorous theoretical framework on what are the conditions for information to be aggregated in a market: essentially we have "separable" securities for which all the available information can be aggregated, and non-separable ones that do not have this property. However, I do not have the necessary background to fully understand and present the ideas in the paper. And I cannot see how to connect this with the literature of information spreading in social networks.

In a more intuitive sense, it seems that we need information to spread and not just influence.

Unfortunately, I cannot grasp the full picture, despite the fact that I tried to look the problem from different angles (Ironic, eh?).

I still not fully understand the implications of the above in the design of processes that involve human input. Does it make sense to show to people what other people have contributed so far? Will we see effects of anchoring? Or will we see the establishment of a common ground and get people to coordinate better and understand each other's input?

How can we quantify and put in a common framework all the above?

Wednesday, November 24, 2010

Mechanical Turk, "Interesting Tasks," and Cognitive Dissonance

It is a well-known fact that the wages on Mechanical Turk are horribly low. We can have endless discussions about this, and my own belief is that it is due to the lack of a strong worker reputation system. Others believe that this is due to the global competition for unskilled labor. And others are agnostic, saying that everything is a matter of supply and demand.

Other people try to explain the low wages by looking at the motivation of the workers: Quite a few people find the tasks on Mechanical Turk to be interesting. Ergo, they are willing to work for less.

Perfectly normal right? The task is interesting, people are willing to do it for less money. Sounds reasonable. Right? RIGHT? Well, be careful: correlation does not imply causation!

Enter the region of social psychology (thanks Konstantinos!): The theory of cognitive dissonance indicates that the causation may go in the entirely opposite direction: The wages are low, so people justify their participation by saying the work is interesting!

This surprising result is due to the paper "Cognitive Consequences of Forced Compliance" from Festinger and Carlsmith (1959). It is one of the classic papers in psychology.

What did Festinger and Carlsmith say?

That people that get low payment to do boring tasks, will convince themselves that they do this because the task is interesting. Otherwise, the conflict in their mind will be just too big: why do they work on such a boring task when the payment is horrible?

In contrast, if someone gets paid well to do the same boring task, they will consider the task boring. These well-paid participant can easily justify that they do the work for the money, (so it makes sense to do a boring job).

Amazingly enough, Festinger and Carlsmith verified this experimentally. Here is the experimental setup description from the Wikipedia entry that describes this intriguing experiment:

Students were asked to spend an hour on boring and tedious tasks (e.g., turning pegs a quarter turn, over and over again). The tasks were designed to generate a strong, negative attitude.

Once the subjects had done this, the experimenters asked some of them to do a simple favor. They were asked to talk to another subject (actually an actor) and persuade them that the tasks were interesting and engaging.

Some participants were paid $20 (inflation adjusted to 2010, this equates to $150) for this favor, another group was paid $1 (or $7.50 in "2010 dollars"), and a control group was not asked to perform the favor.

When asked to rate the boring tasks at the conclusion of the study (not in the presence of the other "subject"), those in the $1 group rated them more positively than those in the $20 and control groups.

The researchers theorized that people experienced dissonance between the conflicting cognitions, "I told someone that the task was interesting", and "I actually found it boring." When paid only $1, students were forced to internalize the attitude they were induced to express, because they had no other justification. Those in the $20 condition, however, had an obvious external justification for their behavior (i.e., high payment), and thus experienced less dissonance.

So, when you read surveys (mine included) that indicate that Mechanical Turk workers participate on the platform because they "find the tasks interesting", (and so it makes sense to pay low wages), please have this alternative explanation in mind:

Turkers convince themselves that the work is interesting, otherwise they would be completely crazy sitting there doing mind-boggling boring work, just to earn wage of a couple of bucks per hour.

Tuesday, November 23, 2010

NYC, I Love You(r Data)

Last year, I experimented with the NYC Data Mine repository as a source of data for our introductory course on information systems (for business students, mainly non-majors). The results of the assignment were great, so I repeated it this year.

The goal of the assignment was to teach them how to grab and run database queries against large datasets. As part of an assignment, the students had to to go the NYC Data Mine repository, pick two datasets of their interest, join them in Access, and perform some analysis of interest. The ultimate goal was to get them to use some real data, and use them to perform an analysis of their interest.

Last year, some students took the easy way out and joined the datasets manually(!) on the borough values (Manhattan, Bronx, Brooklyn, Queens, Staten Island). This year, I explicitly forbid them from doing so. Instead, I explicitly asked them to join only using attributed with a large number of values.

The results are here and most of them are well-worth reading! The analyses below is almost like a tour guide on the New York's data sightseeings :-) The new generation of Nate Silver's is coming.

Enjoy the projects:
  • Academia and Concern for the Environment! Is there a correlation between how much you recycle and how well students perform in school? Are kids who are more involved in school activities more likely to recycle? Does school really teach us to be environmentally conscious? To find out the answers check out our site!
  • An Analysis of NYC Events: One of the greatest aspects about New York are the fun festivals, street fairs and block parties where you can really take in the culture. Our charts demonstrate which time to visit New York or what boroughs to attend events. We suggest that tourists and residents check out our research. Also organizers of events or people who make there money from events should also consult our analysis.
  • How are income and after school programs related?: This study is an analysis of how income levels are related to the number of after school programs in an area. The correlation between income and number of school programs was interesting to analyze across the boroughs because while they did follow a trend, the different environments of the boroughs also had an exogenous effect. This is most evident in Manhattan, which can be seen in the study.
  • Restaurant Cleanliness in Manhattan What are the cleanest and dirtiest restaurants in Manhattan? What are the most common restaurant code violations? We analyzed data on restaurant inspection results and found answers to these questions and more.
  • Ethnic Dissimilarity's Effect on New Business: This analysis focuses on the relationship between new businesses and specific ethnic regions. Do ethnically dominated zip codes deter or promote business owners of differing ethnicities to open up shop?
  • Does The Perception Of Safety In Manhattan Match With Reality? People’s perception of events and their surroundings influence their behavior and outlook, even though facts may present a different story. In this regard, we took a look at the reported perception of people’s safety within Manhattan and compared it to the actual crime rates reported by the NYPD. The purpose of our study was to evaluate the difference between the actual crime rate and perceived safety of citizens and measure any discrepancy.
  • Women's Organizations love food stores!: We have concluded that a large percentage of women's organizations are located near casual dining and takeout restaurants as well as personal and professional service establishments compared to what we originally believed would be shopping establishments.
  • Hispanics love electronics!: Our goal for this project is to analyze the relationship between electronic stores and demographics in a particular zip code. We conducted a ratio analysis instead of a count analysis to lessen the effects of population variability as to create an "apples to apples" comparison. From our analysis, it can be seen that there is a greater presence of electronic stores in zip codes with a higher proportion of Hispanics.
  • Political Contributions and Expenditures: A comprehensive analysis of the political contributions and expenditures during the 2009 elections. The breakdown of who, in what areas of Manhattan contribute as well as how candidates spend their money are particularly interesting!
  • How Dirty is Your Food? Our goal for this project is to analyze the various hygiene conditions of restaurants in New York City. We cross referenced the inspection scores of the restaurants with the cuisine they serve to find out if there was any correlation between these two sets of data. By ranking the average health score of the various cuisines, we can determine which kinds of cuisines were more likely to conform to health standards.
  • Want to Start a Laundromat? An Electronic Store? The best possible places to start a Laundromat and an electronic store. For Laundromats we gave the area that had the lowest per capita income, as we noticed a trend that Laundromats do better in poorer neighborhoods. For electronic stores we found the lowest saturated areas that have the highest per capita income.
  • Where to Let Your Children Loose During the Day in NYC: For this analysis, we wondered whether there was a correlation between how safe people felt in certain areas in New York and the availability of after-school programs in the different community boards.
  • Best Place to Live in Manhattan After Graduation: We analyzed what locations in Manhattan, classified by zip code, would be the best to live for a newly graduate. We used factors like shopping, nightlife, gyms, coffeehouses, and more! Visit the website to get the full analysis.
  • Political Contributions and Structures: Our report analyzes the correlation between political contributions and structures in New York in varying zip codes.
  • Best Places to Eat and Find Parking in New York City: Considering the dread of finding parking in New York City, our analysis is aimed at finding the restaurants with the largest number of parking spaces in their vicinities.
  • Are the Cleanest Restaurants Located in the Wealthiest Neighborhoods? Our analysis between property value and restaurant rating for the top and bottom ten rated restaurants by zip codes in New York City
  • Analysis of Popular Baby Names
  • Restaurant Sanitary Conditions: Our team was particularly interested in the various cuisines offered in various demographic neighborhoods, grouped by zip codes. We were especially curious about the sanitary level of various cuisines offered by restaurants. The questions we wanted to answer were:
    • What zip codes had the highest rated restaurants? What type of cuisines are found in these zip codes?
    • What zip codes had the lowest rated restaurants? What type of cuisines are found in these zip codes?
  • Does having more community facilities improve residents' satisfaction with city agencies? Does having more public and private community facilities in NYC such as schools, parks, libraries, public safety, special needs housing, health facilities, etc lead to greater satisfaction with city services? On intuition, the answer is a resounding YES! With more facilities, we would enjoy our neighborhood better and develop a better opinion of New York City services. But how accurate is this intuition? In this analysis, we put that to the test.
  • Housing Patterns in Manhattan: The objective of our analysis was to identify factors which play a role in determining vacancy rates in Manhattan’s community districts. We inferred that vacancy rates are representative of the population’s desire to live in a particular district. We examined determining factors of why people want to live in a particular district including: quality of education, health care services, crime control in each district, etc.
  • Analysis of Cultural Presence and Building Age by Zip Code: Manhattan is a hub for cultural organizations and opportunities for community involvement. But does the amount of "community presence" differ based on area that you live? Is there any relationship between the year that buildings in various areas were built, and the available programs for students and cultural organizations for the general public in that area? We analyzed whether a relationship existed between the number of cultural organizations and after school programs available in a zip code, and the average year that the buildings in the zip code were built. To further our analysis we looked at whether the age of buildings in areas with greatest "cultural presence" affected the sales price of the buildings.
  • Analysis of Baby Names across the Boroughs: We decided to analyze the Baby Names given in 2008 across the boroughs of Manhattan, the Bronx and Brooklyn. We found the most popular names in each Borough, along with top names specific to each borough that were unpopular in other Boroughs. We also found certain factors that could be a determining factor in the naming of these babies.
  • Analysis of New York City Street Complaints: We analyzed the different kinds of street complaints made in New York City, how the city tends to respond to them, and which streets have the most overall complaints when you also bring building complaints into the picture. This analysis taught us that Broadway has the most street complaints but it also piqued our interest in conducting even further analyses.
  • Campaign Contributions and Community Service Programs The goal of our analysis was to determine if there is a correlation between contributions by NYC residents to election candidates and community service programs. We wanted to see if people who are more financially invested in elections are also more inclined to be involved in their neighborhoods through community programs.
  • Public Libraries in Queens: We looked at how many public libraries there were in each zip code in Queens. We also looked at the number of people and racial composition in each zip code, to see if these factors are related.
  • Sidewalk Cafe Clustering: Our study’s goal is to understand where sidewalk cafes cluster and some potential reasons why they cluster. We start by looking at what areas of the city are most populated with sidewalk cafes. Then we look to see if there are any trends related to gender or race demographics. We finally look to see if there is any influence on property value on the abundance of sidewalk cafes.

The surprise in this year: Most students could not understand what is the "CSV" data file. Many of them thought it was some plain text, and did not try to use it. (Hence the prevalence of electronic and laundromat analyses, which were based on datasets available in Excel format.) I guess next year I will need to explain that as well.

Friday, November 19, 2010

Introductory Research Course: Replicate a Paper

The transition to the happy life of a tenured professor meant that I get to be involved in the wonderful part of the job: Getting to sit in school-wide committees.

Fortunately, I was assigned in an extremely interesting committee: We get to examine the PhD program for the school, see the best practices, see what works and what does not, and try to reconcile everything into a set of recommendations for the faculty to examine. The double benefit for me is that I get to understand how the other departments operate in the school, a thing which, for  a computer scientist in a business school, was still kind of a mystery to me.

Anyway, as part of this task, I learned about an interesting approach to teach starting PhD students about research:

A course in which students pick a paper and get to replicate it.

I think this is a great idea. First of all, I am a big fan of learning-by-doing.

For example, to understand how an algorithm works, you need to actually implement it. Not get the code and re-run the experiments. Implement everything, going as deeply as possible. In C, in Java, in Perl, in Python, in MatLab, in Maple, in Stata, it does not matter. For theory, the same thing: replicate the proofs. Do not skip the details. For data analysis, the same. Get your hands dirty.

During such a process, it is great to have someone to serve as a sounding board. Ask questions about the basics. Why do we follow this rule of thumb? What is the assumption behind the use of this method? Asking these questions is much easier while working on replicating someone else's work, rather then when working on your own research and trying to get a paper out.

Myself, I still write code for this very same reason. I need to see how the algorithm behaves. I need to see the small peculiarities in behavior. This observation gets me to understand better not only the algorithm itself but also other techniques that are employed by the algorithm. I am trying to understand econometrics a little bit deeper the last few months, and I do the same. Frustrating? Yes. Slow? Yes. Helpful? You bet!

So, at the end of the seminar, if the students can replicate the results of the paper, great: They learned what it takes to create a paper and most probably understood deeper a few other topics in the way.

If the results are different than in the original paper, then perhaps this is the beginning of a deeper investigation. Why things are different? Tuning? Settings? Bugs? Perhaps uncovering something not seen by the authors?

Even if the data from the authors are not available, the students should be able to reproduce and get similar results perhaps with different data sets. If the results with different data sets are qualitative different, then the paper is essentially not reproducible. (And replicability is not reproducibility.)

And in any case, no matter if the students can replicate the results or not, no matter if the paper is reproducible or not, the lesson from such an exercise can be valuable.

Often the student who understands better the paper, falls in love with a topic, and gets to learn more and more about the area. Following the footsteps of someone is often the first step to find your own path.

I think this seminar will make it to the final set of recommendations to the school. I am wondering how many other schools have such a course.

Update1: Needless to say, this is a class, not something that students try on their own. Therefore, the professor should pick a set of papers which are educational and useful to replicate. This can be either an easy "classic" paper, or an "important new" result, or even a paper that forces the students to use particular tools and data sources. The students choose from a predefined set, not from the wild.

Update2: Thanks to Jun, a commenter below, we have now a reference to the originator of the idea. Apparently, Gary King has published a paper in 2006, titled ""Publication, Publication", in "Political Science and Politics". From the abstract: "I show herein how to write a publishable paper by beginning with the replication of a published article. This strategy seems to work well for class projects in producing papers that ultimately get published, helping to professionalize students into the discipline, and teaching them the scientific norms of the free exchange of academic information. I begin by briefly revisiting the prominent debate on replication our discipline had a decade ago and some of the progress made in data sharing since."