Paul L. Caron

Tuesday, October 25, 2016

The Use Of Metrics To Assess Scholarly Performance: The Emperor’s New Clothes?

BibliometricsInside Higher Education, Can Your Productivity Be Measured? (reviewing Yves Gingras (University of Quebec), Bibliometrics and Research Evaluation: Uses and Abuses (MIT Press, 2016):

“Since the first decade of the new millennium, the words ranking, evaluation, metrics, h-index and impact factors have wreaked havoc in the world of higher education and research.” ...  Ultimately, Bibliometrics concludes that the trend toward measuring anything and everything is a modern, academic version of “The Emperor’s New Clothes,” in which — quoting Hans Christian Andersen, via Gingras — “the lords of the bedchamber took greater pains than ever to appear holding up a train, although, in reality there was no train to hold.”

Gingras says, “The question is whether university leaders will behave like the emperor and continue to wear each year the ‘new clothes’ provided for them by sellers of university rankings (the scientific value of which most of them admit to be nonexistent), or if they will listen to the voice of reason and have the courage to explain to the few who still think they mean something that they are wrong, reminding them in passing that the first value in a university is truth and rigor, not cynicism and marketing.”

Although some bibliometric methods “are essential to go beyond local and anecdotal perceptions and to map comprehensively the state of research and identify trends at different levels (regional, national and global),” Gingras adds, “the proliferation of invalid indicators can only harm serious evaluations by peers, which are essential to the smooth running of any organization.”

And here is the heart of Gingras’s argument: that colleges and universities are often so eager to proclaim themselves “best in the world” -- or region, state, province, etc. -- that they don’t take to care to identify “precisely what ‘the best’ means, by whom it is defined and on what basis the measurement is made.” Put another way, he says, paraphrasing another researcher, if the metric is the answer, what is the question?

Without such information, Gingras warns, “the university captains who steer their vessels using bad compasses and ill-calibrated barometers risk sinking first into the storm.” The book doesn’t rule out the use of indicators to “measure” science output or quality, but Gingras says they must first be validated and then interpreted in context. ...

While study of publication and citation patterns, “on the proper scale, provides a unique tool for analyzing global dynamics of science over time,” the book says, the “entrenchment” of increasingly (and often ill-defined) quantitative indicators in the formal evaluation of institutions and researchers gives way to their abuses. ... In a similar point that Gingras makes throughout the book, “evaluating is not ranking.”

The intended and unintended consequences of administrative overreliance on indicators used by Academic Analytics, a productivity index and benchmarking firm that aggregates publicly available data from the web, have been cited by faculty members at Rutgers University, for example. The faculty union there has asked the university not to use information from the database in personnel and various other kinds of decisions, and to make faculty members’ profiles available to them [Rutgers Faculty Rebels Against Use Of Metrics To Assess Their Scholarly Performance]. ...

Faculty members at Rutgers also have cited concerns about errors in their profiles, either overestimating or underestimating their scholarship records. Similar concerns about accuracy after a study comparing faculty members’ curriculum vitae with their system profiles led Georgetown University to drop its subscription to Academic Analytics. In an announcement earlier this month, Robert Groves, provost, said quality coverage of the “scholarly products of those faculty studied are far from perfect.” Even with perfect coverage, Groves said, “the data have differential value across fields that vary in book versus article production and in their cultural supports for citations of others’ work.” Without adequate coverage, “it seems best for us to seek other ways of comparing Georgetown to other universities.”

In response to such criticisms, Academic Analytics has said that it opposes using its data in faculty personnel decisions, and that it’s helpful to administrators as one tool among many in making decisions. ...

Gingras said in an interview that the recent debates at Rutgers and Georgetown show the dangers of using a centralized and especially private system “that is a kind of black box that cannot be analyzed to look at the quality of the content -- ‘garbage in, garbage out.’” Companies have identified a moneymaking niche, and some administrators think they can save money using such external systems, he said. But their use poses “grave ethical problems, for one cannot evaluate people on the basis of a proprietary system that cannot be checked for accuracy.” The reason managers want centralization of faculty data is to “control scientists, who used to be the only ones to evaluate their peers,” Gingras added. “It is a kind of de-skilling of research evaluation. … In this new system, the paper is no [longer] a unit of knowledge and has become an accounting unit.”

Gingras said the push toward using “simplistic” indicators is probably worst in economics and biomedical sciences; history and the other social sciences are somewhat better in that they still have a handle on qualitative, peer-based evaluation. And contrary to beliefs held in some circles, he said, this process has always had some quantitative aspects.

The book criticizes the “booming” evaluation market, describing it is something of a Wild West in which invalid indicators are peddled alongside those with potential value. He says that most indicators, or variables that make up many rankings, are never explicitly tested for their validity before they are used to evaluate institutions and researchers. ...

Universities are like supertankers, he says, and simply can’t change course so quickly. So ranking institutions every year or even every couple of years is folly -- bad science -- and largely a marketing strategy from producers of such rankings. Gingras applauds the National Research Council, for example, for ranking doctoral departments in each discipline every 10 years, a much more valid interval that might just demonstrate actual change. (The research council ranking gets a lot of additional criticism for its system, however.) ...

The AAUP earlier this year released a statement urging caution against the use of private metrics providers to gather data about faculty research. Henry Reichman, a professor emeritus of history at California State University at East Bay who helped draft the document as chair of the association’s Committee A on Academic Freedom and Tenure, said faculty bibliometrics were a corollary to the interest in outcomes assessment, in that the goals of each are understandable but the means of measurement are often flawed. Faculty members aren’t necessarily opposed to the use of all bibliometrics, he added, but they should never replace nuanced processes of peer review by subject matter experts.

Bibliometrics in many ways represent a growing gap, or “gulf,” between administrators and faculty members, Reichman added; previously, many administrators were faculty members who eventually would return to the faculty. While that’s still true in many places, he said, university leaders increasingly have been administrators for many years or are drawn from other sectors, where “bottom lines” are much clearer than they are in higher education. ...

Metrics 2Brad Fenwick, vice president of global and academic research relations for Elsevier ... said bibliometrics is not an alternative to peer review, but a complement. He compared administrators’ use of bibliometrics to baseball’s increasingly analytic approach made famous in Michael Lewis’s Moneyball: The Art of Winning an Unfair Game, in which human beings use a mix of their expertise, intuition and data to make “marginally better decisions.” And those decisions aren’t always or usually negative, he said; they might mean an administrator is able to funnel additional resources toward an emerging research focus he or she wouldn’t have otherwise noticed.

Cassidy Sugimoto, associate professor of informatics and computing at Indiana University at Bloomington and co-editor of Scholarly Metrics Under the Microscope: From Citation Analysis to Academic Auditing (2014),  said criticism of bibliometrics for evaluating scholars and scholarship is nothing new, and the field has adjusted to them over time. Issues of explicit malpractice -- such as citation stacking and citation “cartels” -- are addressed by suppressing data for such individuals and journals in citation indicators, for example, she said. And various distortions in interpretations, such as those caused by the “skewness” of citation indicators, wide variation in discipline and scholars’ age, have been mitigated by the adoption of more sophisticated normalizations.

“Just as with any other field of practice, scientometrics has self-corrected via organized skepticism, and bibliometrics continues to offer a productive lens to answer many questions about the structure, performance and trajectory of science,” she said.

Book Club, Legal Education, Scholarship | Permalink