Friday, November 19, 2010

Introductory Research Course: Replicate a Paper

The transition to the happy life of a tenured professor meant that I get to be involved in the wonderful part of the job: Getting to sit in school-wide committees.

Fortunately, I was assigned in an extremely interesting committee: We get to examine the PhD program for the school, see the best practices, see what works and what does not, and try to reconcile everything into a set of recommendations for the faculty to examine. The double benefit for me is that I get to understand how the other departments operate in the school, a thing which, for  a computer scientist in a business school, was still kind of a mystery to me.

Anyway, as part of this task, I learned about an interesting approach to teach starting PhD students about research:

A course in which students pick a paper and get to replicate it.

I think this is a great idea. First of all, I am a big fan of learning-by-doing.

For example, to understand how an algorithm works, you need to actually implement it. Not get the code and re-run the experiments. Implement everything, going as deeply as possible. In C, in Java, in Perl, in Python, in MatLab, in Maple, in Stata, it does not matter. For theory, the same thing: replicate the proofs. Do not skip the details. For data analysis, the same. Get your hands dirty.

During such a process, it is great to have someone to serve as a sounding board. Ask questions about the basics. Why do we follow this rule of thumb? What is the assumption behind the use of this method? Asking these questions is much easier while working on replicating someone else's work, rather then when working on your own research and trying to get a paper out.

Myself, I still write code for this very same reason. I need to see how the algorithm behaves. I need to see the small peculiarities in behavior. This observation gets me to understand better not only the algorithm itself but also other techniques that are employed by the algorithm. I am trying to understand econometrics a little bit deeper the last few months, and I do the same. Frustrating? Yes. Slow? Yes. Helpful? You bet!

So, at the end of the seminar, if the students can replicate the results of the paper, great: They learned what it takes to create a paper and most probably understood deeper a few other topics in the way.

If the results are different than in the original paper, then perhaps this is the beginning of a deeper investigation. Why things are different? Tuning? Settings? Bugs? Perhaps uncovering something not seen by the authors?

Even if the data from the authors are not available, the students should be able to reproduce and get similar results perhaps with different data sets. If the results with different data sets are qualitative different, then the paper is essentially not reproducible. (And replicability is not reproducibility.)

And in any case, no matter if the students can replicate the results or not, no matter if the paper is reproducible or not, the lesson from such an exercise can be valuable.

Often the student who understands better the paper, falls in love with a topic, and gets to learn more and more about the area. Following the footsteps of someone is often the first step to find your own path.

I think this seminar will make it to the final set of recommendations to the school. I am wondering how many other schools have such a course.

Update1: Needless to say, this is a class, not something that students try on their own. Therefore, the professor should pick a set of papers which are educational and useful to replicate. This can be either an easy "classic" paper, or an "important new" result, or even a paper that forces the students to use particular tools and data sources. The students choose from a predefined set, not from the wild.

Update2: Thanks to Jun, a commenter below, we have now a reference to the originator of the idea. Apparently, Gary King has published a paper in 2006, titled ""Publication, Publication", in "Political Science and Politics". From the abstract: "I show herein how to write a publishable paper by beginning with the replication of a published article. This strategy seems to work well for class projects in producing papers that ultimately get published, helping to professionalize students into the discipline, and teaching them the scientific norms of the free exchange of academic information. I begin by briefly revisiting the prominent debate on replication our discipline had a decade ago and some of the progress made in data sharing since."