Thursday, January 18, 2024

The PiP-AUC score for research productivity: A somewhat new metric for paper citations and number of papers

Many years back, we conducted some analysis on how the number of citations for a paper evolves over time. We noticed that while the raw number of citations tends to be a bit difficult to estimate, if we calculate the percentile of citations for each paper, based on the year of publication, we get a number that stabilizes very quickly, even within 3 years of publication. That means we can estimate the future potential of a paper rather quickly by checking how it is doing against other papers of the same age. The percentile score of a paper is a very reliable indicator of its future.

To make it easy for everyone to check the percentile scores of their papers, we created a small app at

that allows anyone to search for a Google Scholar profile and then calculate the percentile scores of each paper. We then take all the papers for an author, calculate their percentile scores, and sort them in descending order based on their scores. This generates a plot like this, with the paper percentile on the y-axis and the paper rank on the x-axis.

Then, an obvious next question came up: How can we also normalize the x-axis, which shows the number of papers?

Older scholars have more years to publish, giving them more chances to write high-percentile papers. To control for that, we also calculated the percentiles for the papers published, by using a dataset of around 15,000 faculty members at top US universities. The plot below shows how the percentiles for the number of publications evolve over time.

Now, we can use the percentile scores for the number of papers published to normalize the x-axis as well. Instead of showing the raw number of papers on the x-axis, we normalize paper productivity against the percentile benchmark shown above. The result is a graph like this for the superstar Jure Leskovec

and a less impressive one for yours truly:

Now, with a graph like this, with the x and y axes being normalized between 0 and 1, we have a nice new score that we have given the thoroughly boring name "Percentile in Percentile Area Under the Curve" score, or PiP-AUC for short. It is a score that ranges between 0 and 1, and you can play with different names to see their scores.

At some point, we may also calculate the percentile scores of the PiP scores, but we will do that in the future. :-) UPDATE: If you are also curious about the percentiles for the PiP-AUC scores, here is the distribution:

The x-axis shows the PiP-AUC score, and the y-axis shows the corresponding percentile. So, if you have a PiP-AUC score of 0.6, you are in the top 25% (i.e., 75% percentile) for that metric. With a score of 0.8, you are in the top 10% (i.e., 90% percentile), etc.

In general, the tool is helpful when trying to understand the impact of newer work published in the last few years. Especially for people with many highly cited but old papers, the percentile scores are very helpful for quickly finding the newer gems. I also like the PiP-AUC scores and plots, as they offer a good balance of overall productivity and impact. Admittedly, it is a strict score, so it is not especially bragging-worthy most of the time :-)

(With thanks to Sen Tian and Jack Rao for their work.)

Tuesday, October 18, 2022

Tell these fucking colonels to get this fucking economist out of jail.

Today is October 18th. It is 41 years since Greece voted for Andreas Papandreou with a 48% vote percentage to be elected as prime minister, fundamentally changing the course of history for Greece. Positively or negatively, this is still debated, but the change was real.

On October 6th, Roy Radner passed away at the age of 95. He was a faculty member at our department and a famous microeconomist with a highly distinguished career. Many others have written about him and his accomplishments as an economist and academic, so I will not try to do the same.
But Roy also played an important role in making that election in 1981 possible. Why? Let me tell you his story.

Tuesday, November 23, 2021

"Geographic Footprint of an Agent" or one of my favorite data science interview questions

 Last week we wrote in the Compass blog how we estimate the geographic footprint of an agent.

At the very core, the technique is simple: Use the addresses of the houses that an agent has bought or sold in the past; get their longitude and latitude; and then apply a 2-dimensional kernel density estimation to find what are the areas where the agent is likely to be active. Doing the kernel density estimation is easy; the fundamentals of our approach are material that you can find in tutorials for applying a KDE. There are two interesting twists that make the approach more interesting:

  1. How can we standardize the "geographic footprint" score to be interpretable? The density scores that come back from a kernel density application are very hard to interpret. Ideally, we want a score from 0 to 1, with 0 being "completely outside of the area of activity" and 1 being "as important as it gets". We show how to use a percentile transformation of the likelihood values to create a score that is normalized, interpretable, and very well calibrated.
  2. What are the metrics for evaluating such a technique? We show how we can use the concept of "recall-efficiency" curves to provide a common way to evaluate the models.

You can read more in the blog post. 

Monday, November 18, 2019

Mechanical Turk, 97 cents per hour, and common reporting biases

The New York Times has an article about Mechanical Turk in today's print edition: "I Found Work on an Amazon Website. I Made 97 Cents an Hour". (You will find a couple of quotes from yours truly).

The content of the article follows the current zeitgeist: Tech companies exploiting gig workers. 

While it is hard to argue that there are tasks on MTurk that are really bad, I think that the article paints an unfairly gloomy picture of the overall platform.

Here are a few of the issues:
  • Availability and survivorship bias. While the paper does describe accurately the cesspool of low-paying tasks that are available on Mechanical Turk, it fails to convey the fact that these tasks are available on the platform because nobody wants to work on them. The tasks that are easily available for everyone are the ones for which nobody competes to grab: low-paying, badly designed tasks.
  • The activity levels of workers follow a power-law. We have plenty of evidence that a significant part of the work on MTurk is done by a small minority of workers. While it is hard to have a truly accurate measurement of what percent of the workers do what percent of the tasks, the 1% rule is a good approximation. For example, in my demographic surveys, where I explicitly limit the participation to only once per month, 50% of the responses come from 5% of the participants. Expect the bias to be much stronger in other, more desirable tasks. Such a heavily biased propensity to participate introduces strong sampling problems when trying to find the right set of workers to interview.
  • Doing piecemeal work while untrained results in low pay. This is a pet peeve of mine, for all the articles of the type "I tried working on MTurk / driving Uber / delivering packages / etc / and I got a lousy pay". Well, if you work piecemeal on any task, the tasks will take a very long time initially, and the hourly wage will suck. This will hold for Turking, coding, lawyering, or anything else. If someone decides to become a freelance journalist, the first few articles will result in abysmally bad hourly wages as well; expert freelance writers often charge 10x the rates that beginner freelancer writers charge, if not more. I am 100% confident that the same applies to MTurk workers as well: Experienced workers make 10x what beginners make.
Having said that, I do agree that Amazon could prohibit tasks that are obviously paying very little (as a rule of thumb, it is impossible to get paid more than minimum wage when the HIT is paying less than 5c/task). But I also think that regular workers are smart enough to know that and avoid such tasks. 

Friday, November 16, 2018

Distribution of paper citations over time

A few weeks ago we had a discussion about citations, and how we can compare the citation impact of papers that were published in different years. Obviously, older papers have an advantage as they have more time to accumulate citations.

To compare papers, just for fun, we ended up opening the profile page of each paper in Google Scholar, and we analyzed the paper citations years by year to find the "winner." (They were both great papers, by great authors, fyi. It was more of a "Lebron vs. Jordan" discussion, as opposed to anything serious.)

This process got me curious though. Can we tell how a paper is doing at any given point in time? How can we compare a 2-year-old article, published in 2016, with 100 citations against a 10-year-old document, published in 2008, with 500 citations?

To settle the question, we started with the profiles of faculty members in the top-10 US universities and downloaded about 1.5M publications, across all fields, and their citation histories over time.

We then analyzed the citation histories of these publications, and, for each year, we ranked the papers based on the number of citations received over time. Finally, we computed the citation numbers corresponding to different percentiles of performance.

Cumulative percentiles

The plot below shows the number of citations that a paper needs to have at different stages to be placed in a given percentile.

A few data points, focusing on certain age milestones: 5-years after publication, 10-years after publication, and lifetime.

  • 50% line: The performance of a "median" paper. The median paper gets around 20 citations 5 years after publication, 50 citations within 10 years, and around  100 citations in its lifetime. Milestone scores: 20,50,90
  • 75% line: These papers perform "better," citation-wise than 75% of the remaining papers with the same age. Such papers get around 50 citations within 5 years, 100 citations within 10 years of publication, and around 200 citations in their lifetime. Milestone scores: 50,100,200
  • 90% line: These papers perform better than 90% of the papers in their cohort. Around 90 citations within 5 years, 200 citations within 10 years, and 500 citations in their lifetime. Milestones scores: 90,200,500

Yearly percentiles and peak years

We also wanted to check at which point papers reach their peak, and start collecting fewer citations. The plot below shows the percentiles based on the yearly numbers of accumulated citations. The vast majority of papers tend to reach their peak 5-10 years after publication; the number of yearly citations starts declining after 5-10 years.

Below is the plot of the peak year for a paper based on the paper percentile:

There is an interesting effect around the 97.5% percentile: After that level, it seems that a 'rich-gets-richer' effect kicks in, and we effectively do not observe a peak year. The number of citations per year keeps increasing. You could call these papers the "classics".

What does it take to be a "classic"? 200 citations at 5 years or 500 citations at 10 years.