It’s apparent to anyone who’s familiar with the scientific literature that citations to other papers are not exactly an ideal system. It’s long been one of the currencies of publication, since highly-cited work clearly stands out as having been useful to others and more visible in the scientific community (the great majority of papers do actually get cited eventually by someone, by the way). But anything that be measured will be managed, and managed includes the darker meanings “gamed” and “manipulated”. The classic method is to cite your own work to hell and gone, but readers will have heard of reviewers who demand that their own work be cited, of citation rings where everyone gets together to boost each other’s numbers, of citations for sale, of publishers packing their own journals with internal references, and more schemes besides.

Now, outside of this sort of chicanery, you see many other problems: (1) people citing things because other people have cited them, not because they’ve actually looked over said reference themselves, (2) people just flat missing things, relevant papers (or patents!) that really would shore up their own arguments but don’t even get a look, and (3) people citing things that don’t necessarily do the job that they seem to think it does.

In that last category I put a special irritating feature of the synthetic organic chemistry literature, one that every bench chemist sees coming before I reach the end of this sentence. I refer to the nesting-doll method of referencing the preparation of some compound: instead of telling everyone how you made it, you just say that it was prepared by the method of Arglebargle, reference 15. So you go look up the Arglebargle paper and find that they don’t tell you how to make the damn thing, either, but refer you to Dingflinger et al. in the even earlier literature. I have had the Dingflinger-level papers themselves send me to yet a third reference, by now something written during the Weimar Republic and of course containing the finest spectral characterization data available in 1931, which ain’t much. Would-it-have-killed-you-to-put-in-the-procedure-and-the-NMR-data, etc.

So let’s make sure not to forget the major influences of laziness and stupidity on citation behavior. Those at least are honest; fools can be very sincere indeed. And those are really the only explanations that I can come up with for what’s described in this recent publication (commentary here). It describes the situation in the social sciences literature around the “Hawthorne Effect”.

Let’s start out by stipulating that the effect itself is a myth (one of several scientific myths the paper (open-access) references in its introduction. This one goes back to the 1920s and studies of worker behavior at Western Electric’s Hawthorne plant. You’ve probably heard of this stuff: among other things, the study supposed found that productivity increased when the lights in the factory were brightened, but also increased when the lights were lowered, and the take-home lesson, for decades, was that the knowledge of the workers that they were participating in a study is what actually changed their work habits. That’s not an idiotic conclusion prima facie, because there most certainly are observer effects in social science studies. The problem is, the Hawthorne work turns out to be a terrible example to use. The studies themselves are a mess by modern standards(remember, we’re talking about the 1920s), and the data are nowhere near as clean as the story has it. Referencing the “Hawthorne Effect” has over the years become a shorthand for just about any observer effect you’d like to have a lazy name for, and use of the term has been actively discouraged.

What this latest paper does is look at a set of papers that do that job of discouragement – works that actively seek to argue against the Hawthorne Effect and point out the problems with use of the term. So far, so good – this is a field trying to clean up its terminology and its thinking, and there’s nothing wrong with that. The authors identified three papers in particular that set out the detailed case against the effect and against use of the term, and then looked at papers since then that have cited these.

And there’s the problem. What they found was a rather large set of papers that cite one or more of these papers as actually affirming the reality of the Hawthorne effect. As they say, “a major explanation for the asymmetry between the affirmative articles and the negative articles appears to be not reading, or not understanding, the cited paper“, and by gosh, you can never rule that one out, for sure. It’s a remarkable situation, and of course it helps to propagate the very concepts that the original authors are trying to knock down. For example, the worst case is a 2000 paper against Hawthorne effect and against its very utility as a concept – when these current authors looked over the text of 196 papers citing that work between 2001 and 2018, it turns out that 168 of them are actually affirming the Hawthorne effect (!) Some of these (a minority) noted the reference as a dissenting voice, but others just blithely cited it in a list of papers about the effect itself. The conclusion of the paper is worth thinking about:

Of course, to assess whether the three articles were successful at communicating their critique of the Hawthorne Effect, we ought to consider the number of readers that has been dissuaded from believing in and using the Hawthorne Effect in their research. For all we know this group is in majority. It is, however, a silent one. When it comes to academic publishing, the affirming articles are dominant on the issue of the Hawthorne Effect, and are likely the major contributors to the forming of the published consensus. These publications, we surmise, will efficiently recruit new believers in the effect, and in turn new affirmative citations in the literature. The findings not only demonstrate that the three efforts at criticizing the Hawthorne Effect to varying degrees were unsuccessful, but they also suggest that if the intention behind the critiques were to reduce the frequency of affirmations of the claim in the scientific corpus, they may have achieved the very opposite.

This makes me wonder if the various articles over the years warning people off of (say) useless or inappropriate chemical probes have done the job that we’ve hoped for. The way such things keep being used is not an encouraging sign. Anyone know of any direct examples of this sort of thing in the chemistry or biology literature?