Sunday, May 13, 2007

Impact Factors and Other Metrics for Faculty Evaluations

A few months back, as a department we had to submit to the school a list of the "A" journals in our field. The ultimate reason for this request was to have a list of prime venues for each field, and thus facilitate the task of the promotion & tenure committees that include researchers with little (or no) knowledge of the candidates field.

Generating such a list can be a difficult task, especially if we try to keep the size of the list small, and directly connected with the problem of ranking journals. There are many metrics that can be used to for such a ranking, and the used metric is the "impact factor", proposed by Eugene Garfield. The impact factor ranks the "impact" of the research published in each journal by counting the average number of citations to the 2- and 3-year old articles published in the journal, over the last year. The basic idea is that journals with a large number of recent incoming citations (from last year) that also point to relatively recent articles (2- and 3-year old papers), show the importance of the topics published in the given journal. The choice of the time window makes comparisons across fields difficult, but generally within a single field the impact factor is a good metric for ranking journals.

Garfield, most probably expecting the outcome, explicitly warned that the impact factor should not be used to judge the quality of the research of an individual scientist. The simplest reason is the fact that the incoming citations to the papers that are published in the same journal follow a power-law: a few papers receive a large number of citations, while many others get only a few. Copying from a related editorial in Nature: "we have analysed the citations of individual papers in Nature and found that 89% of last year’s figure was generated by just 25% of our papers. [...snip...] Only 50 out of the roughly 1,800 citable items published in those two years received more than 100 citations in 2004. The great majority of our papers received fewer than 20 citations." So, the impact factor is a pretty bad metric for really examining the quality of an individual article, even though the article might have been published in an "A" journal.

The impact factor (and other journal-ranking metrics) was devised be used as a guideline for librarians to allocate the subscription resources, and as a rough metric for providing guidance to scientists that are trying to decide which journals to follow. Unfortunately, such metrics have been mistakenly used as convenient measures for summarily evaluating the quality of someone's research. ("if published in a journal with high impact factor, it is a good paper; if not, it is a bad paper"). While impact factor can (?) be a prior, it almost corresponds to a Naive Bayes classifier that does not examine any feature of the classified object before making the classification decision.

For the task of evaluating the work of an individual scientist, other metrics appear to be better. For example, the much-discussed h-index and its variants seem to get traction. (I will generate a separate post for this subject.) Are such metrics useful? Perhaps. However, no metric, no matter how carefully crafted can substitute careful evaluation of someone's research. These metrics are only useful as auxiliary statistics, and I do hope that they are being used like that.

Friday, May 11, 2007

Names (or Parents?) Make a Difference

I just read an article about a study showing that the name of a girl can be used to predict whether a girl will study math or physics after the age of 16. The study, done by David Figlio, professor of economics at the University of Florida indicated that girls with "very feminine" names, such as Isabella, Anna, and Elizabeth, are less likely to study hard sciences compared to girls with names like Grace or Alex.

Myself, I find it hard to understand how someone can estimate the "femininity" of a name but it might be just me. Even if there is such a scale though, I do not see any causality in the finding, as implied in the article. (I see predictive power, but no causality.) In my own interpretation, parents that choose "very feminine" names also try to steer their daughters towards more "feminine" careers. I cannot believe that names by themselves set a prior probability on the career path of a child. (The Freakonomics book had a similar discussion about names and success.)

Oh well, how you can lie with statistics...

Tuesday, May 8, 2007

Replacing Survey Articles with Wikis?

Earlier this year, together with Ahmed Elmagarmid and Vassilios Verykios, we published a survey article at IEEE TKDE on duplicate record detection (also known as record linkage, deduplication, and with many other names).

Although I see this paper as a good effort in organizing the literature in the field, I will be the first to recognize that the paper is incomplete. We tried our best to include every research effort that we identified, and the reviewers helped a lot in this respect. However, I am confident that there are still many nice papers that we missed.

Furthermore, since the time the paper has been accepted for publication, many more papers have been published and many more will be published in the future. So, this means that the useful half-life of (any?) such survey is necessarily short.

How can we make such papers more relevant and more resistant to deprecation? One solution that I am experimenting with is to make the survey article a wiki, and then post it to Wikipedia, allowing other researchers to add their own papers in the survey.

I am not sure if Wikipedia is the best option, due to licensing issues, though. A personal wiki may be a better option, but I do not have a good grasp of the pros and cons of each approach. One of the benefits of Wikipedia is the existence of nice templates for handling citations. One of the disadvantages is the copyright license of Wikipedia, which may discourage (or prevent) people from posting material there.

Furthermore, it is not clear that a wikified document is the best way to organize a survey. A few days back, I got a (forwarded) email from Foster Provost, who was seeking my opinion for the best way to organize an annotated bibliography. (Dragomir Radev had a similar question.) Is a wiki the best option? Or is it by construction too flat? Should we use some other type of software that allows people to generate explicit, annotated connections between the different papers? (Any public tool?)

Any ideas?