Communications of the ACM Academic Rankings Considered Harmful!

Academic rankings have a huge presence in academia. College rankings by U.S. News and World Report (USNWR) help undergraduate students find the "perfect school." Graduate-program rankings by USNWR are often the most significant decision-making factor for prospective graduate students. The Academic Ranking of World Universities (known also as the "Shanghai Ranking") is one that attracts much attention from university presidents and governing boards. New academic rankings, of many different forms and flavors, have been popping up regularly over the last few years.

Yet, there is also deep dissatisfaction in the academic community with the methodology of such rankings and with the outsize role that commercial entities play in the ranking business. The recent biennial meeting of the Computing Research Association (CRA) dedicated a session to this topic (see http://cra.org/events/snowbird-2016/#agenda), asserting that "Many members of our community currently feel the need for an authoritative ranking of CS departments in North America" and asking "Should CRA be involved in creating a ranking?" The rationale for that idea is the computing-research community will be better served by helping to create some "sensible rankings."

The methodology currently used by USNWR to rank computer-science graduate program is highly questionable. This ranking is based solely on "reputational standing" in which department chairs and graduate directors are asked to rank each graduate program on a 15 scale. Having participated in such reputational surveys for many years, I can testify that I spent about a second or two coming up with a score for the over 100 ranked programs. Obviously, very little contemplation went into my scores. In fact, my answers have clearly been influenced by prior-year rankings. It is a well-known "secret" that rankings of graduate programs of universities of outstanding reputation are buoyed by the halo effect of their parent institutions' reputations. Such reputational rankings have no academic value whatsoever, I believe, though they clearly play a major role in academic decision making.

But the problem is deeper than the current flawed methodology of USNWR's ranking of graduate programs. Academic rankings, in general, provide highly misleading ways to inform academic decision making by individuals. An academic program or unit is a highly complex entity with numerous attributes. An academic decision is typically a multi-objective optimization problem, in which the objective function is highly personal. A unidimensional ranking provides a seductively easy objective function to optimize. Yet such decision making ignores the complex interplay between individual preferences and programs' unique patterns of strengths and weaknesses. Decision making by ranking is decision making by lazy minds, I believe.

Furthermore, academic rankings have adverse effects on academia. Such rankings are generally computed by devising a mapping from the complex space of program attributes to a unidimensional space. Clearly, many such mappings exist. Each ranking is based on a specific "methodology," that is, a specific ranking mapping. The choice of mapping is completely arbitrary and reflects some "judgement" by the ranking organization. But the academic value of such a judgement is dubious. Furthermore, commercial ranking organizations tweak their mappings regularly in order to create movement in the rankings. After all, if you are in the business of selling ranking information, then you need movement in the rankings for the business to be viable. Using such rankings for academic decision making is letting third-party business interests influence our academic values.

Thus, to the question "Should CRA get involved in creating a ranking?" my answer is "absolutely not." I do not believe that "sensible rankings" can be defined. The U.S. National Research Council's attempt in 2010 to come up with an evidence-based ranking mapping is widely considered a notorious failure. Furthermore, I believe the CRA should pass a resolution encouraging its members to stop participating in the USNWR surveys and discouraging students from using these rankings for their own decision making. Instead, CRA should help well-informed academic decision making by creating a data portal providing public access to relevant information about graduate programs. Such information can be gathered from an extended version of the highly respected Taulbee Survey that CRA has been running for over 40 years, as well as from various open sources. CRA could also provide an API to enable users to construct their own ranking based on the data provided.

Academic rankings are harmful, I believe. We have a responsibility to better inform the public, by ceasing to "play the ranking games" and by providing the public with relevant information. The only way to do that is by asserting the collective voice of the computing-research community.

Follow me on Facebook, Google+, and Twitter.

Moshe Y. Vardi, EDITOR-IN-CHIEF

Copyright held by author.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.

Comments

R Oldehoeft

Prof. Vardi is correct in that a linear ranking of CS programs makes no sense. Like many other things whose outcome depends on many factors, I have concluded that there are only three ranks: High, Middle, and Low. For example, after years of experience at graduate program applications, so many factors contribute to GRE scores that I could not distinguish further beyond these three categories. The same is true for CS program evaluations.

So I propose the following ranks: Thriving, Established and Aspiring. Further criteria to sort programs into these bins depends on what is being evaluated so, as an example, consider the quality of undergraduate CS programs.

Thriving: Attracts top undergraduates from all over the nation; The program is rigorous enough that really good students who apply themselves will successfully complete it; Graduates usually take highly paid initial jobs or are admitted to Thriving CS graduate programs.

Established: Regularly attracts a range of students from a wide regional area and sometimes nationally; The program is rigorous enough so that students must engage seriously in their educations to succeed; Graduates typically take well-paying first jobs or are admitted to Established or Thriving CS graduate programs.

Aspiring: Attracts mostly regional students of variable quality; Program rigor allows most students to succeed while professional standards are still maintained; Graduates generally take regional jobs at typically lower salaries or enter Aspiring or Established CS graduate programs. A highly motivated student with faculty support in an Aspiring program may achieve greater initial success at a job or a Thriving graduate program, but that is anecdotal.

Within these categories the institutions can be listed alphabetically. For graduate programs, a different set of criteria would apply, but it would be unusual for an undergraduate program to be placed into a lower group than its companion graduate program.

Just my $0.02. Further comments and rebuttals are welcome.

Rod Oldehoeft

Chair and Professor Emeritus

Computer Science Department

Colorado State University

Scott Cotton

I think there is an even more general problem with rankings than multi-objectiveness and overabundance of mappings: the ranking "meta" effect. Ranking systems always end up becoming an object which is gamed and manipulated.

To take the example of Google page rank, it was democratic under the assumption that web pages were created "honestly". High quality content would naturally be rewarded democratically because it would interest many other pages, transitively. But it didn't take into account the meta effect: what happens when sets of web pages are linked in such a way as to manipulate a page rank. This is what many search engine optimisations teach: create essentially fake, but nominally legitimate web pages which links to the page you want to boost. Result: a mass of essentially fake but nominally legitimate web pages.

A ranking system which takes into account this "meta effect" a priori, and maintains democratic-ness even when users game it would be interesting, if it were possible. Of course academic citation rankings are based traditionally on acyclic graphs, and so quite different, But the meta-effect is quite general, still applies, and in my opinion is the single most detrimental aspect of rankings.

CACM Administrator

The following letter was published in the Letters to the Editor in the November 2016 CACM (http://cacm.acm.org/magazines/2016/11/209131).

--CACM Administrator

No one likes being reduced to a number. For example, there is much more to my financial picture than my credit score alone. There is even scholarly work on weaknesses in the system to compute this score. Everyone may agree the number is far from perfect, yet it is used to make decisions that matter to me, as Moshe Y. Vardi discussed in his Editor's Letter "Academic Rankings Considered Harmful!" (Sept. 2016). So I care what my credit score is. Many of us may even have made financial decisions taking into account their potential impact on credit score.

As an academic, I also produce such numbers. I assign grades to my students. I strive to have the assigned grade accurately reflect a student's grasp of the material in my course. But I know this is imperfect. At best, the grade reflects the student's knowledge today. When a prospective employer looks at it two years later, it is possible an A student had crammed for the exam and has since completely forgotten the material, while a B student deepened his or her understanding substantially through a subsequent internship. The employer must learn to get past the grade to develop a richer understanding of the student's strengths and weaknesses.

As an academic, I am also a consumer of these numbers. Most universities, including mine, look at standardized test scores. No one suggests they predict success perfectly. But there is at least some correlation enough that they are used, often as an initial filter. Surely there are students who could have done very well if admitted but were not considered seriously because they did not make the initial cutoff in test scores. A small handful of U.S. colleges and universities have recently stopped considering standardized test scores for undergraduate admission. I admire their courage. Most others have not followed suit because it takes a tremendous amount of work to get behind the numbers. Even if better decisions might result, the process simply requires too much effort.

As an academic, I appreciate the rich diversity of attributes that characterize my department, as well as peer departments at other universities. I know how unreasonable it is to reduce it all to a single number. But I also know there are prospective students, as well as their parents and others, who find a number useful. I encourage them to consider an array of factors when I am trying to recruit them to choose Michigan. But I cannot reasonably ask them not to look at the number. So it behooves me to do what I can to make it as good as it can be, and to work toward a system that produces numbers that are as fair as they can be. I agree it is not possible to come anywhere close to perfection, but the less bad we can make the numbers, the better off we all will be.

H.V. Jagadish

Ann Arbor, MI

_________________________________

AUTHOR'S RESPONSE

My Editor's Letter did not question the need for quantitative evaluation of academic programs. I presume, however, that Dr. Jagadish assigns grades to his students rather than merely ranking them. These students then graduate with a transcript, which reports all their grades, rather than just their class rank. He argues that we should learn to live with numbers (I agree) but does not address any of the weaknesses of academic rankings.

Moshe Y. Vardi, Editor-in-Chief

CACM Administrator

The following letter was published in the Letters to the Editor in the November 2016 CACM (http://cacm.acm.org/magazines/2016/11/209131).

--CACM Administrator

I could not agree more with Moshe Y. Vardi's Editor's Letter (Sept. 2016). The ranking systems whether U.S.-focused (such as U.S. News and World Report) or global (such as Times Higher Education, World University Reputation Ranking, QS University Ranking, and Academic Ranking of World Universities, compiled by Shanghai Jiaotong University in Shanghai, China) have all acquired lives of their own in recent years. These rankings have attracted the attention of governments and funding bodies and are widely reported in the media. Many universities worldwide have reacted by establishing staff units to provide the diverse data requested by the ranking agencies and boosting their communications and public relations activities. There is also evidence that these league tables are beginning to (adversely) influence resource-allocation and hiring decisions despite their glaring inadequacies and limitations.

I have been asked to serve on the panels of two of the ranking systems but have had to abandon my attempts to complete the questionnaires because I just did not have sufficient information to provide honest responses to the kinds of difficult, comparative questions about such a large number of universities. The agencies seldom report how many "experts" they actually surveyed or their survey-response rates. As regards the relatively "objective" ARWU ranking, it uses measures like number of alumni and staff winning Nobel Prizes and Fields Medals, number of highly cited researchers selected by Thomson Reuters, number of articles published in journals of Nature and Science, number of articles indexed in Science and Social Science Citation Index, and "per capita performance" of a university. It is not at all clear to what extent the six narrowly focused indicators can capture the overall performance of modern universities, which tend to be large, complex, loosely coupled organizations. As well, the use of measures like number of highly cited researchers named by Thomson Reuters/ISI can exacerbate some of the known citation malpractices (such as excessive self-citations, citation rings, and journal-citation stacking). As Vardi noted, the critical role of commercial entities in the rankings notably Times, QS, USNWR, and Thomson Reuters is also a concern.

Joseph G. Davis

Sydney, Australia

Displaying all 4 comments