Friday, May 23, 2008

The (Statistical) Significance of the Impact Factor

Being in the middle of my tenure track, I cannot skip running into different ways that people use to evaluate research. One of the most common ways to evaluate papers (at least at a very high level) is to look at the impact factor of the journal, and classify the paper as "published in a top journal," "published in an OK journal," or "published in a B-class journal". I have argued in the past that this is a problematic practice, and an article published in Nature provides the evidence. To summarize the reasoning: articles published within the same journal have widely different citation numbers, therefore using the average is simply misleading.

I think that the best example that I have heard that illustrates the problem of reporting averages of highly-skewed distributions is from Paul Krugman's book "The Conscience of a Liberal":
...Bill Gates walks into a bar, the average wealth of the bar's clientele soars...
This is exactly what happens when evaluating papers using impact factors for journals. So, this introduces two problems:
  • If you evaluate a paper using the impact factor of the journal, the evaluation is almost always a significant overestimate or a significant underestimate of the paper's "impact". (Assuming that citations measure "impact".) Read the analysis below for an illustrating example.
  • The impact factor itself is a very brittle metric, as it is heavily influenced by a few outliers. If indeed the in-journal citation distribution is a power-law, then the impact factor itself is a useless metric.
To make this more clear, I will pick as an example the ACM Transactions of Information Systems. The journal has a rather impressive impact factor for a computer science journal, with an increasing trend:
Now, let's try to dissect the 5.059 impact factor for 2006. The impact factor is the number of citations generated in 2006, pointing to the papers published in 2005 and 2004, divided by the total number of published articles. According to ISI Web of Knowledge, we have:
2006 Impact Factor

Cites in 2006 to articles published in:
2005 = 25
2004 = 147
Sum: 172

Number of articles published in:
2005 = 15
2004 = 19
Sum: 34

Calculation: 172/34 = 5.059
Now, let's split down these numbers by publication. By looking at the number of citations per publication, we can see that there is a single paper "Evaluating collaborative filtering recommender systems" by Herlocker, which has almost 30 citations in 2006. Taking this single publication out, the impact factor is reduced to 4.3.

In fact, if we take out of the calculations the papers published in the Special Issue for Recommender Systems (Jan 2004), then the impact factor drops even more, and comes close to 2.5. At the same time, the impact factor of the papers published in the special issue is much higher, getting closer to 15.0 or so.

Given the unusual high impact of that special issue, we can expect for the 2007 impact factor for TOIS to decrease substantially. It would not be surprising to see the impact factor for 2007 to be in the pre-2003 levels.

This simple example illustrate that the impact factor rarely represents the "average" paper published in the journal. There are papers that are significantly stronger than the impact factor illustrates and papers that are significantly weaker. (Implication: Authors that use the impact factor of the journals as a representative metric of the quality of their research, they use a metric that is almost never representative.)

Therefore, a set of other metrics may be preferable. The obvious choices is to use the median instead of the average, and report the Gini coefficient for the papers published in the journal. The Gini coefficient will show how representative is the impact factor. Next step is to examine the distribution of the number of citations within the journals. Is it a power-law, or an exponential? (I was not able to locate an appropriate reference.) Having these answers can lead to better analysis and easier comparisons.