Sunday, February 6, 2011

The unreasonable effectiveness of simplicity

There are a few techniques, which are extremely easy to understand and implement. At the same time, they appear to be extremely basic and should be very easy to beat with more advanced techniques. However, this is often not the case. Consider the following examples:



Majority voting and aggregating discrete votes.

Let's say that we are trying to label objects using discrete labels (e.g., is this blog comment "spam" or "not spam"). For each object, we get multiple people to look at it, and label it, the current practice today on Mechanical Turk.

The simplest aggregation technique: Use the majority vote as the correct answer.

This seems a ridiculously easy baseline to beat. We can model quality of the workers. We can control for the varying difficulty of the examples that need to be rated. We can control for the different types of expertise of the workers, and match them with the examples that are best for them. Plenty of papers were published around this topic.

What is the improvement? Modest at best, and non-existent most of the time. The only (real) improvement, in most cases, means kicking out the spammers and take the majority vote across the good workers. Do we need any advanced technique for that? No. A few gold cases here and there (ala Crowdflower), or a simple comparison of how often one workers agrees with the majority, is typically enough.

Why is that? Because majority vote is a simple model. No parameters to estimate. For anything more advanced, we need a lot of data for the model to generate robust parameter estimates. The errors introduced by incorrect parameter estimates typically alleviates the advantages of the more complex modeling.



Averages and aggregating continuous probability estimates


Now consider the case of combining forecasts from multiple sources. For example, we want to predict the weather, and we have multiple sources each with its own forecast. Or we have many stock market analysts, covering the same stock and making predictions for future performance.

Consider the simplest way to aggregate: average across all estimates $p_i$.

$\hat p = \frac{1}{N} \cdot \sum_i^N p_i$

Very straightforward. As in the case of aggregating discrete labels, it is trivial to improve, in theory.

This topic has a long history in the literature, and there are even meta-studies that examine the effectiveness of the various approaches. See for example the survey-style studies:
Both studies reach similar conclusions: You can definitely improve simple averages, but most of the time the improvement is marginal, and you lose robustness. From Clemen and Winkler: "...simple combination rules (e.g., simple average) tend to perform quite well. More complex rules, sometimes outperform the simple rules, but they can be somewhat sensitive, leading to poor performance in some instances."



Sometimes a simple baseline algorithm is so good for practical purposes, that any improvements have only academic interest. I know that we have papers to write, careers to advance, and grants to get, but sometimes it is good to stop and think: "What did I gain, compared to a simpler alternative? Does it make sense to introduce so much additional complexity for the sake of some minor improvement?"