Anyone familiar with American academia will tell you that the US News rankings of academic programs play an outsized role in this world. Among other things, US News ranks graduate programs of computer science, by their strength in the field at large as well as certain specialties. One of these specialties is Programming Languages, the focus of this blog.

The US News rankings are based solely on surveys. Department heads and directors of graduate studies at a couple of hundred universities are asked to assign numerical scores to graduate programs. Departments are ranked by the average score that they receive.

It’s easy to see that much can go wrong with such a methodology. A reputation-based ranking system is an election, and elections are meaningful only when their voters are well-informed. The worry here is that respondents are not necessarily qualified to rank programs in research areas that are not their own. Also, it is plausible that respondents would give higher scores to departments that have high overall prestige or that they are personally familiar with.

In this post, I propose using publication metrics as an input to a well-informed ranking process. Rather than propose a one-size-fits-all ranking, I provide a web application to allow users to compute their own rankings. This approach has limitations, which I discuss in detail, but I believe it’s a reasonable start to a better system.

Do we need rankings at all?

Many would argue that the idea of department rankings is inherently problematic. Any system of comparison must evaluate a department’s research quality using pithy metrics. To do this to a creative activity like research is reductionist and possibly harmful.

A glib rebuttal to this is that the genie is already out of the bottle. The marketplace has spoken, and it clearly likes university rankings. By not starting a conversation about a better system of ranking than US News’s, we reward the status quo.

A better justification is that department rankings are a valuable service to prospective students. Matching prospective graduate students to programs is a problem of resource allocation in a market. However, this market has information asymmetry, because students don’t have a clear idea of what makes for a good Ph.D. experience. When courting prospective students, universities put on shows that have limited connection to the reality of the graduate research experience. The problem is worse for international students, who frequently join universities sight unseen. As a result, it is easy for students to make suboptimal choices when selecting a program to join. At their best, university rankings help students make more informed decisions.

Ranking by objective metrics

What would a fairer system for ranking computer science departments look like? It seems to me that any such system should depend, in part, on real data on research productivity. The problem, of course, is that “research productivity” is a fuzzy concept. Efforts to approximate it using “bean counting” measures like paper or citation counts, or grant dollars, have basic shortcomings.

However, I think that these approximations have some value, especially when seen from the point of view of the end users of department rankings. Presumably, a prospective student would want to have a strong CV at the point when she finishes her Ph.D. She has a higher chance of doing so if the group she joins publishes regularly at top-tier publication venues in the areas in which she is interested. She is more likely to stay funded if her advisor has a track record of bringing in grant money. She is more likely to do highly cited research if her advisor has more highly cited papers than others in the same research area and level of seniority.

All in all, objective metrics have limitations, but also produce some useful signals. One could, at the least, make them a factor in computing department rankings, even using a reputation-based system. For example, survey respondents in a reputation-based rankings could choose (or be asked) to use productivity-based rankings as an input in their decision-making process. Presumably, this would lead respondents to make more informed judgments.

Interactive ranking: putting the user in charge

Another issue with existing ranking systems is that they are static, one-size-fits-all solutions. Think of a prospective student who is interested in the interface of Programming Languages (PL) and Machine Learning (ML). He should probably pick a department that has been active in PL and ML in recent times, and an advisor who has a strong track record in at least one of these areas. Depending on his interests, he might want an advisor who is primarily a PL researcher but also collaborates with ML folks, or the other way around. Finally, he may want to work with professors who are at a certain level of seniority, or whose students and postdocs have gotten high-profile research jobs. Unfortunately, current ranking systems do not support such nuanced decision-making.

Maybe what is needed, then, is a rankings application that can be customized to different user needs. For example, such an app could let a user assign weights to the different subareas of computer science, and rank departments and advisors by their weighted strength in these areas. The system would be fundamentally interactive: users would be able to change the weights on various variables and observe changes to the ranking results.

An interactive ranking system based on publication productivity

Over the last few weeks, I have coded up the first draft of such a ranking app. The rankings this app computes are based on a single objective metric: publication counts at top-quality outlets. The reason why I only used this metric is simple: data on which researcher publishes where is available from the DBLP bibliography database, and this data can be used to compute paper counts. On the other hand, I had no easy access to data on citations or funding or the history of a department’s former students. I believe the ranking system is defensible, as acceptance at top venues correlates, at least to some extent, with quality of research. However, by definition, it considers one dimension of a complex, multidimensional space. One might extend the app to allow ranking using a richer set of features.

DBLP doesn’t track institutions of researchers, but Alexandra Papoutsaki and her coauthors have recently developed a listing of faculty members at 50 top US universities. By cross-linking this dataset with DBLP data, one can determine where professors in a given department publish.

I used feedback from colleagues and friends to select a few top-tier publication venues in several subareas of computer science (see here for more details). For example, the top venues in Programming Languages (PL) are the Symposium on Principles of Programming Languages (POPL), the Symposium on Programming Language Design and Implementation (PLDI), the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), and the International Conference on Functional Programming (ICFP). The top venues for Algorithms and Complexity are the Symposium on Theory of Computing (STOC), the Conference on Foundations of Computer Science (FOCS), and the Symposium on Discrete Algorithms (SODA).

The application’s interface allows the user to select a time window within the last 15 years and to put weights on various areas. The app assigns a score to each professor by giving him or her w points (w is a number between 0 and 1) for each paper in a top-tier venue in an area with weight w. We also identify a set of relevant professors — intuitively, the set of professors who are prospective advisors in the areas of interest. To qualify as a relevant professor, a faculty member must have published 3 or more papers across top venues in areas where the user has put a nonzero weight, within the selected period. Departments are now ranked according to three different metrics.

1) Aggregate productivity. In this measure, the score of a department is the sum of the scores for all its professors. A department that scores high on this measure is likely to be a high-energy research environment, with a culture of publication in strong conferences in the areas of interest. However, this metric is likely to favor larger departments over smaller ones.

2) Maximal productivity. Here, the score of a department is the greatest score received by one of its professors. Unlike aggregate productivity, this metric is not directly affected by a department’s size. A justification for this measure is that at the end, a prospective student needs only one advisor. Consequently, joining a small department with one prolific researcher can possibly beat joining a larger department with multiple less productive researchers.

3) Group size. This metric estimates the size of a department’s group in the students’ areas of interest, by counting the number of relevant professors that it employs. This statistic puts the productivity rankings in perspective. Arguably, it is also of intrinsic interest to prospective students. Larger groups allow more courses, seminars, and interactions with fellow-students. On the other hand, some students prefer the cosiness of small departments, which are likely to score poorly on this metric.

Results

The purpose of this application is to allow interactive exploration of data, and different users will draw different conclusions from this exploration. So, rather than present any results, I invite you to use the application yourself!

Limitations and conclusion

As mentioned earlier, this ranking application is meant to be a first draft rather than the final word. Given that publication in selective venues is key to success as a researcher, I believe that the app produces some useful information. However, any ranking based on objective metrics has limitations, and this system is even more limited in considering just one measure.

The app has some implementation issues as well. The roster of faculty members used here was generated through crowdsourcing and is not guaranteed to be free of bugs. Also, linking faculty members in the roster to DBLP records isn’t always easy: a professor’s DBLP entry may use a name that is slightly different from their name in the roster, and some professors appear under multiple names in DBLP. I used a heuristic to join the two data sets while correcting for common variations of names, and also manually examined the data, but this is hardly a failsafe process.

However, I am hoping that the wisdom of the crowd can be used to overcome some of these limitations. The code and the data for the app are freely available (see here for more details). I have created a public Google document to collect information about errors in the data; please leave a comment there if you find bugs. You are also welcome to extend the app with additional features and productivity metrics.

Going beyond this particular app, I think we need to start a conversation about how to help prospective students make better decisions about where to go for graduate school. For a long time, we have allowed entities who do not have a stake in our discipline to rank our graduate programs. These rankings have dubious methodology, and they also have real implications for our departments. At the same time, we must recognize that they fill a real need. Creating nuanced, data-driven approaches to school ranking is a constructive way of challenging their hegemony.

[This post has been updated since it was first published.]