Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis.

Openness, transparency, and reproducibility have become the new watchwords of responsible data practice in science. A series of recent high-profile scandals has brought to light the problems of erroneous analysis, falsified data, problematic methodology, and cherry-picked presentation of research results (Carey and Belluck, 2015; Coy, 2013; Economist, 2013; Kolata, 2011). Some evidence suggests these problems, whether intentional or accidental, are more widespread than has been previously recognized, and that existing norms and processes of academic publishing and peer-review are poorly configured to detect or deter them (Gelman, 2015; Horton, 2015). For instance, in one recent large-scale reanalysis of 100 published psychology studies, only about one-third of the results could be replicated (Open Science Collaboration, 2015); in another recent analysis of 60 published economics papers, less than half of the main results could be replicated (Chang and Li, 2015). In response, a number of standard-bearers in the scientific process have recently called for or instituted pro-transparency policies—from requiring researchers to make their data publicly available, to insisting that study results be independently replicated prior to publication (Ablin, 2014; Alberts et al., 2015; Institute of Medicine, 2015; Jacoby, 2015).

Similar pro-transparency principles have also gained traction in government. The ethos of open government, advocated in recent years by the Obama administration (Ellman and Suh, 2013), posits that accessible datasets and transparent decision-making processes are necessary precursors to government accountability, responsible governance, public trust, and, ultimately, improved policy outcomes. Providing open access to information produced in federally funded research is said to be a core function of democracy, an effective means of accelerating job growth and innovation, and an essential strategy for promoting an engaged and informed public (Holdren, 2013; Seife and Thacker, 2015).

As social scientists, we count ourselves in support of the (often overlapping) agendas of the open science and open governance movements. In numerous cases, accessibility and replication have strengthened the integrity of data-driven decisions and increase the accountability of decision-makers. Indeed, it is difficult to imagine many principled arguments against transparency (except to the extent necessary to protect personal privacy or otherwise sensitive information)—especially when data are analyzed as part of public governance processes, or when public money has been used to produce them.

However, in this Commentary, we highlight a critical challenge to the growing movement toward increased data transparency in science and public policy. We note that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” designed to advance goals that have little to do with good science or good governance. Framing these maneuvers in the language of transparency can be politically strategic, because approaches that emphasize open access carry tremendous appeal, particularly in current political, technological, and institutional contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” lobbying in the 1990s, and contemporary legislative efforts to open environmental data to public inspection.

“Sound science,” data quality, and the institutionalization of uncertainty In the 1990s, the tobacco company Philip Morris launched a “sound science” initiative aimed at casting doubt on the link between secondhand tobacco smoke and lung cancer by challenging prevailing interpretations of key studies (Baba et al., 2005). Among the campaign’s first objectives was to “legislate public access to epidemiological data used in support of federal laws and regulations” (SRIC Innovation, 1997). Recognizing that frustration in the oil and coal industries over proposed clean air regulations represented a “hook” that could provide an opportunity for political coalition-building, the sound science team presented data access legislation to Capitol Hill allies (Philip Morris, 1997). In 1998, Sen. Richard Shelby (R-Ala.) added a rider called the Data Access Act (DAA) to an appropriations bill that mandated public access to all data produced by federally funded scientists employed by nonprofit institutions (Michaels, 2008). The law, commonly known as the Shelby Amendment and passed in September 1998, did not apply to privately funded studies. In 2000, the sound science team achieved an even bigger victory when a tobacco industry lobbyist convinced Rep. Jo Ann Emerson (R-Mo.) to slip a two-sentence rider into a 712-page appropriations bill requiring all federal agencies to issue guidelines “ensuring and maximizing the quality, objectivity, utility, and integrity of information (including statistical information) disseminated by the agency.” The Data Quality Act (DQA)—also known as the Information Quality Act—required agencies to establish a mechanism through which “affected persons” could “seek and obtain correction of information” promulgated by the government. No hearings were held on the DQA, and it is not clear that most members were aware it was being passed (Wagner, 2003; Weiss, 2004). Many scientists have denounced the DAA and DQA as attempts to magnify and institutionalize the uncertainty inherent in the science policy enterprise (Michaels and Monforton, 2005; Rosenstock, 2006; Schick et al., 2007). David Michaels, current chief of the Occupational Safety and Health Administration, has referred to the DAA as an invitation to “dredge and manipulate” government data in an effort to muddy the scientific waters (Michaels, 2008: 177). Michaels has written that the DQA has “successfully slowed agency activities” by consuming scarce resources and staff (Michaels, 2008: 190). DQA corrections have become a favored tactic for delaying agency actions that run counter to industry interests. An analysis by the Washington Post found that DQA petitions had been filed predominantly by regulated industries, lobbyists, and trade organizations (32 of 39 petitions analyzed) (Weiss, 2004). Public health scholars have warned that the DQA may prompt agencies to “self-censor” important information that is likely to come under challenge (Rosenstock, 2006). Because these purportedly pro-transparency laws do not apply to industry-funded science and are invoked only in the face of agency action, they are like “a knife that cuts only one way”—against federal intervention (Houck, 2003).

“Secret science” in environmental regulation Similar legislative efforts are unfolding today. The Secret Science Reform Act (SSRA) is currently pending in Congress. (It was passed in the House of Representatives in March 2015, and is currently awaiting Senate action.) The bill would prohibit the Environmental Protection Agency (EPA) from proposing or implementing regulations “based on science that is not transparent or reproducible,” and requires the public release of all data used in the EPA’s assessments in a manner that allows for independent analysis and reproduction of results. Advocacy for the SSRA has been couched firmly in the language of data transparency: as its sponsor Lamar Smith (R-Tex.) put it, “[c]ostly regulations should not be created behind closed doors and out of public view” (US Congress press release 2015). Notably, the notion of using the catchphrase “secret science” to advocate for data disclosure was discussed in private meetings of consultants to the tobacco industry as early as 1998 (Gianelli, 1998). The SSRA hits squarely at the intersection of open science and open government. For the science community, the SSRA appears to respond to calls for increased replicability, open access, and data sharing. On the governance side, many informed advocates on the left have pushed to open up government datasets to fight corruption and cast “sunlight” on policy processes. But upon closer inspection, the SSRA appears to use data transparency as a Trojan Horse through which to advance a different goal: namely, to hamstring the processes of the EPA. The EPA relies upon approximately 50,000 scientific studies annually to make environmental policy. The Congressional Budget Office forecasts that the SSRA would severely restrict the EPA’s ability to enact new regulations, due to both the costs of making so much data publicly available, and the fact that certain classes of data (for example, industry-held or medical data) would be impossible to release (and thus to rely upon) according to the terms of the proposed law (Congressional Budget Office, 2014)—creating a “catch-22” for the agency (Jaffe, 2015; Rosenberg, 2014). These effects do not appear to have been unforeseen: Congressional votes on the SSRA have divided along party lines, and some of its chief advocates are active in the climate-change denial movement. On the other side, more than 50 scientific societies and universities have signed statements in opposition to the bill (American Association for the Advancement of Science, 2014).

Data dredging and the risks of “scientific cacophony” Beyond these legislative actions, the open data movement has provoked a complex debate among scientists who wish to ensure that vested interests do not take advantage of open-access policies to advance their goals to the detriment of public health. For example, some clinical and public health researchers, informed by prior scientific battles with the tobacco industry and other powerful interests, have expressed concern that biased actors may “dredge” existing data sets to generate new analyses that contradict established scientific and public health positions (Christakis and Zimmerman, 2013; Kaiser, 2003; Sacks et al., 2003). Open access to research data is said to be a “double-edged sword” (Spertus, 2012). To prevent the “scientific cacophony” that might ensue from truly open access, some have proposed that data sharing may not be useful when those requesting data have strong vested interests (Christakis and Zimmerman, 2013). Scientific proponents of data sharing, in contrast, assert that de-identified raw data “should eventually be put into the public domain for unconditional, universal access” (Strom et al., 2014). For such advocates of unrestricted access, status quo arrangements in which data are tightly held and the original investigators’ interpretation prevails “may be just as or more harmful” than a situation in which diverse private actors are empowered to challenge the accepted wisdom with their own assessments of the evidence (Krumholz and Peterson, 2014). These competing perspectives invoke different assumptions about the institutional processes of science-based governance. Regulatory open-data advocates propose essentially a democratic, free-market approach to the evaluation of scientific findings: release the data, they suggest, and the “best” findings will rise to the top, while promoting accountability for decision-makers. Other researchers fear that open-data policies might, paradoxically, increase industry “capture” of regulatory processes, as resource-rich special interests exploit scientific uncertainty to impose undue administrative delay.

The limits of transparency in science and governance In an era of increased skepticism toward science, anything less than unqualified openness on the part of regulatory agencies may be taken as an indication that something is being hidden. In this Commentary, we emphasize that the principle of data transparency is subject to limits (Jasanoff, 2006). Data processes in science and governance face at least three sources of constraint, which calls for increased data transparency may strategically play against. First, institutional resource limitations are unavoidable—open processes require substantial outlays of time and money, which can hamstring the workings of agencies to the extent that they cannot accomplish their aims. Second, countervailing interests, such as privacy protection, may make it impossible to fully release data; while techniques like anonymization can provide a middle ground, initiatives like the SSRA do not make allowances for them. In time, technological and methodological advancements may ameliorate these two concerns somewhat, as infrastructures for data sharing become standardized and institutional norms shift. But a final constraint remains: the fact that epistemological limitations constrain data-driven political decision-making. Agencies charged with protecting public health and the environment must make decisions in the face of scientific uncertainty, because science by its nature is incomplete and only rarely provides precise answers to the complex questions policymakers pose. Sarewitz (2000) compares the goals of science and politics: The goal of politics is the achievement … of an operational consensus that enables action. This is a very different goal from that of science, which seeks to expand insight and knowledge about nature through an ongoing process of questioning, hypothesizing, validation, and refutation … . When a scientific problem is contentious and the object of a vibrant research effort, consensus is extremely difficult to achieve—the process of scientific investigation intrinsically militates against, is designed to inhibit, premature consensus. Data transparency, even with its many virtues, cannot alter this fundamental aspect of scientific inquiry. Cacophony and contention are core elements of the scientific enterprise. By invoking a “narrow, idealized portrayal of science” in which research reliably produces clear and reproducible facts, the special interests behind the DAA, DQA, and SSRA mischaracterize science’s “inevitably incomplete, uncertain, contested, and … often unreliable” nature in their efforts to stymie regulatory activities with which they disagree (Sarewitz, 2015). Technical experts at regulatory agencies frequently commit this same mistake by failing to delineate the extent to which their science policy decisions inevitably are informed by value judgments that go well beyond the available science (Wagner, 2003). Regulators and their opponents thus co-produce a false impression of the contribution of science to public policy development. Rules that invoke the specter of “secret science” or that exist mainly to impede policy processes weaponize the concept of data transparency. They also, ironically, may themselves violate principles of transparency: the DAA and DQA were covertly inserted into large appropriations bills—a strikingly opaque approach. The SSRA proposes that evidence that has not been released publicly should be excluded from EPA analyses even if it might improve agency decision-making. Transparency requires that relevant research not be dismissed from policy processes even if it is incomplete or less than definitive (Wagner and Steinzor, 2006). There are other recent examples of legislative efforts that invoke the language of transparency and scientific quality control in the name of democratic values but that appear to be camouflaged efforts to improve the lot of special interests (Marcos, 2015; Urology Times, 2013). The political valence of data transparency is a critical reminder of the inherently sociopolitical nature of all technologies, including institutional data practices. Though transparency is often framed as an unalloyed good (provided that privacy interests can be adequately protected), in practice it provides a means through which diverse stakeholders attempt to achieve diverse political goals. Politically motivated proponents of transparency, in some cases, may exploit the epistemological and institutional realities that accompany the production of science and science-based policy. Policies that allow for open sharing of data may improve perceptions that science-based decisions are credible, but data access approaches must be carefully designed to ensure they make science and governance better, not worse. Just as Big Data itself creates phenomenological and epistemological challenges that must be critically assessed, its attendant processes also warrant careful analysis.

Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

References