A commentary appeared today online in Nature, co-authored by me and Dorothy Bishop, which took up the issue of transparency in science, with a particular emphasis on the “dark” side of transparency and openness. The article is freely available, and in this post I condense our argument into a few words and then offer some additional issues that are likely to arise during discussion.

Both authors of the commentary are strongly committed to open science: Dorothy made some strong arguments in favor of sharing data here, and I am a co-author of a recent paper on openness published in Royal Society Open Science, which introduced the Peer Reviewers Openness initiative or PRO for short.

However, like many good things such as red wine or healthy dieting, openness and transparency, when taken to an extreme, may also have adverse consequences for the conduct of science. Much has been written—including by me—about the harassment of scientists in contested areas by interminable freedom-of-information (FOI) requests, requests for data when those data are already in the public domain, and so on. A collection of testimonials about such harassment, as well as the need to preserve transparency and openness, arose out of a meeting sponsored by the Royal Society that I organized in June last year: a common thread that emerged from that meeting and the testimonials is the need for a system of “triage” that differentiates legitimate scrutiny and healthy debate from problematic research practices or harassment campaigns that masquerade as scientific inquiry.

Our commentary proposed a set of “red flags” that can be used to approach this triage. They are not definite criteria but we felt they constitute a worthwhile initial contribution to what will surely be an on-going discussion.

The basic principle underlying our commentary is one of symmetry: while scientists ought to disclose all relevant funding and their data, and while their science must be subject to scrutiny, the same rules ought to apply to critics. For example, if researchers are encouraged to preregister their research and their analysis plan, the same should apply to critics who seek to re-analyze data. If scientists must disclose their conflicts of interest, why should requestors for data not do likewise?

I find the idea of symmetry self-evidently fair and reasonable. However, that does not mean that everybody will agree: after all, fairness and reason may motivate some critics of scientific work but clearly not all of them. Conversely, while many scientists may openly declare their conflicts of interest, not all of them do, and we may therefore encounter continued resistance to openness.

There are a few specific questions that we touched on in the commentary but did not have time or space to do full justice. I therefore provide some additional thoughts here.

Researchers’ control over their (behavioral or medical) data during reanalysis.

In our commentary we note that “Researchers also need control over how data is to be used if it goes beyond what participants agreed to (for example, analysis of ethnic, race or gender differences in data collected for different purposes).”

What does this mean and is it even enforceable? If I make available a multi-dimensional data set with many variables and someone else wants to do some post-hoc analysis of variables that I didn’t consider, should I, as a researcher, have any control over that?

This question may be best examined in a very concrete, if rather stark, hypothetical context: Suppose I have collected data on a specific cognitive task in an experiment that examined the efficacy of a new training regime. To control for potential covariates, the data set includes numerous demographic variables, including race, gender, political affiliation, and religious denomination. The data are convincingly anonymized and participants have given consent that their data will be made publicly available.

So far, so open and transparent.

Now suppose the Ku-Klux-Klan (which, alas, exists) and the Anti-Muslim-Bigotry League (which likely exists in some form if not by that name) demand access to the data for reanalysis along racial and religious lines.

Did my participants really give consent to have their data used in that manner? Would anyone from a minority group ever again give consent to participate in an ostensibly “harmless” experiment to discover better training techniques if those data can be exploited by a clever post-hoc fishing expedition to score a political point?

To my knowledge, this problem has been largely ignored in the open data debate and it urgently requires attention.

By the way, the problem would be manageable, and the original researcher’s control enforceable, if requestors have to preregister analyses in the same way that the original researchers hopefully did in the first place. (And yes, we should move towards a culture in which pre-registration becomes a strong normative expectation, if not a requirement, of research.)

Do the requestor’s motives matter?

This is another tricky and nuanced issue: if I have made available my data from a potentially controversial research project, does it matter if they are being re-analyzed by someone who is opposed to my results for political or ideological reasons?

At first glance, the answer should be a clear “no, motives should not matter.” If a re-analysis is really driven by ideological motives, then its flaws will be readily identifiable and can be corrected by the usual scholarly means (such as peer-reviewed publications).

There is, however, a problem: Many areas of science that are contentious involve a political component in which the public’s opinion matters a great deal. For example, it matters whether the public supports labeling of genetically-modified (GM) foods, it matters whether the public supports non-smoking policies in public places, and so on. Now, as a rule of thumb, it is fair to assume that the public will not demand political action on any such problem while they perceive there to be scientific disagreement. After all, the tobacco industry famously stated that “doubt is our product” because they knew that the appearance of a scientific disagreement—even where there was none—would forestall tobacco control.

This creates a dilemma for open data that, to my knowledge, has not been satisfactorily resolved: In contested arenas, the motives underlying a request for data do matter because an illegitimate re-analysis can have far-reaching flow-on consequences. Is it really ethical to let the tobacco industry cherry-pick public-health data to death, thereby delaying tobacco control legislation at a huge cost of human lives and health? Not surprisingly, public-health researchers are therefore very concerned about who has access to raw data.

There is no easy resolution to this issue but it is worthy of further discussion and examination.

Do the requestor’s abilities matter?

Setting aside motivation, does it matter who requests data? Should there be a competence criterion? Or are all requestors equal under the transparency umbrella?

At first glance, the answer should again be a clear “ability or competence should not matter,” for the reasons already noted.

There is, however, a problem: what if the data contain information that challenges meaningful anonymization? Suppose the research involves some medical condition that has a social stigma attached to it, and as part of the research many medically-relevant items of information are collected (e.g., the name or post code of the participant’s physician, the participant’s income and profession, and so on).

It is a challenge to anonymize data at that level of granularity—especially if the sample is small or limited to a small geographic area—although various solutions exist, for example through “delinking” of identifying information from research-relevant information. (Even de-linking is not an entirely trivial matter because unless the linking key has been destroyed or is held by another institution, data are not considered anonymized under the U.K. data protection act.)

Supposing the challenges to anonymization have been met, for example by irreversible delinking, then sensitivity of data alone need not—indeed should not—preclude sharing of the data with other researchers working in an institutional framework in which ethical strictures apply and non-disclosure agreements are meaningful and enforceable.

However, should such data be released to Mr. Tom D. Harry who hails from Widgiemooltha and runs a Center for Transparency in his dunny?

On this issue, I come down on a clear “no”. Sensitive medical or psychological data whose anonymization is challenging ought not to be released to people whose facility to keep them confidential cannot be reasonably established. The U.K. Medical Research Council’s guidelines explicitly state: “The custodian [of the data] must ensure that the group [receiving the data] accepts a duty of confidence and protects confidentiality through training procedures, etc, to the same standards as the custodian [my emphasis].”

Mr. Tom D. Harry is unlikely to meet those stipulations, and if he does not, then he ought not to receive the data.

Of course, procedures must be put in place that balance transparency and concerns about violations of privacy in those instances. Arguably, this should not be left to the original researchers—who may have their own ulterior motives—but must be resolved by some independent arbitration process.

The institutional response to harassment

As we note in the commentary, universities have complaints processes for good reasons. However, complaints are also a known tool of harassment that are amply documented in the context of tobacco research.

How can institutions respond? Universities—by law—must not tolerate harassment of academics or students based on race or gender. So why should they tolerate harassment of academics based on contentious science? Once the triage has been conducted and harassment has been identified, the university’s duty of care should naturally extend to offering protection.

This can be achieved in a number of ways that deserve further discussion. One technique, briefly identified in our commentary, is a public declaration of support by the university for an academic and, importantly, for the status of the scientific issue that is being attacked.

A relevant precedent involves the Rochester Institute of Technology, which affirmed the overwhelming scientific consensus on climate change when one of its academics, philosopher Dr. Lawrence Torcello, became the subject of a hate campaign after he published an opinion piece in an online newspaper.

Dr. Torcello summed up the situation thus in an email to me, which I am citing with permission:

“In fact, RIT didn’t just endorse my academic freedom and the scientific consensus on climate change, the statement published by the institute also acknowledged that my work had been misrepresented by certain media outlets, it encouraged people to read my actual piece, and it provided a link to my piece. Additionally, a motion was raised at academic senate to endorse the university’s statement supporting me, which passed unanimously. The dean of my particular college also sent around an official communication to liberal arts faculty condemning the harassment and making faculty aware that his office is prepared to support any faculty harassed for their research. Finally, a generalized version of the statement issued in my defense was placed permanently on the Provost’s website in order to direct any future harassers to the statement. I was consulted and kept in the loop at every stage of the university’s response. The dean’s office has also offered to help sponsor a conference on such academic harassment. … I think it makes a pretty good case study of how universities ought to respond in such situations.”

Let the conversation continue, without harassment and with an emphasis on transparency, open data, and full disclosure of potential conflicts of interest.