Trust, but verify

How I use scite.ai to see if a scientific article has been supported or contradicted

I have been lying to my students.

OK, perhaps not lying. Not exactly. But I have, unwittingly, been communicating misinformation.

As a psychology professor, one of the undergraduate courses I teach is Introduction to Psychology. Part of this course deals with so-called “abnormal” psychology — mental disorders and their treatment. I discuss in some detail the etiology of various psychological disorders, including depression. Buried in my lecture notes on depression is a reference to a particular gene, 5-HTTLPR. One unusual variation of this gene, I say, appears to be associated with the development of depression, but we are not sure why.

A recently published paper using a large sample clearly puts this association to rest, directly refuting the argument that 5-HTTLPR plays a role in depression. As the psychiatrist Scott Alexander (a pseudonym) argues on his blog, “[t]his isn’t a research paper. This is a massacre.”

To be frank, I’m not sure how this tidbit made its way into my lectures. I have been teaching for almost a decade now, and my lecture notes and slides are comprised of findings I picked up in various graduate classes, readings, and talks I have attended, as well as content from the textbooks I assign. I suspect this is similar to many other researchers teaching it.

I’m not a clinical psychologist, and my knowledge of how genes exert influence on the development of mental disorders is somewhat limited. When I first created my lectures years ago, I trusted my training, I trusted the literature, andI trusted the peer-review process to weed out unreliable claims and maintain the integrity of the scientific literature.

But as we have seen, this trust is often misplaced.

For all the reasons Alexander points out (and yes, you really should read the whole post), the prospect of a single gene being responsible for (or moderating) something like depression is just plain silly. However, the ability to evaluate scientific claims — even for a professional scholar and teacher in a given field — is severely limited. One has to familiarize oneself with a dizzying array of claims from a wide array of sources. Heavily cited papers, a proxy of quality used by many researchers today, may be heavily cited because they replicate well (supporting citation), or because they simply establish a rudimentary fact that other authors wish to reference in passing (a mentioning citation). Determining how an article is cited today — does the citation support the claim, does it refute it, or does it just mention it — and not just how many times, is so impractical to do that it is effectively impossible. Indeed, if a paper has 100 citations one would need to read 100 papers to see which support or contradict the claim versus those that just mention it. That’s more than any of us can do for individual papers, let alone the dozens that we must read each month. Indeed, researchers have written entire research papers based on the citation analysis of a single paper!

We’re working to change that at scite.ai by making it easy to check if any scientific article has been supported or contradicted by analyzing hundreds of millions of citations using deep learning. So when Twitter was abuzz over Alexander’s blog post last Friday, I searched one of the articles in question on scite (Influence of Life Stress on Depression: Moderation by a Polymorphism in the 5-HTT Gene,) to see how it had been “scited”.

The scite report for this paper indicates that the vast majority (approximately 97%) of citations of this paper merely mention it, without explicitly supporting or contradicting it. Thirteen citations directly contradict (see example below) the paper and forty eight support it. Thus, the vast majority of the citations in scite’s database say nothing about the veracity of the claims made in this paper, and a substantial number of them indicate disagreement with one or more of the paper’s conclusions. Had I seen this report and the contradicting cites surface by scite, I would have would have presented the model that 5-HTTLP causes depression as controversial rather than as a fact.