This is a nascent field that is developing rapidly [ 10 ]. While changes in the ways in which epidemiologic information is obtained, analyzed, and disseminated are likely to result in great social benefits, it is important to recognize and anticipate potential risks and unintended consequences. In this article we identify some of the key ethical challenges associated with DDD activities and outline a framework for addressing them. We argue that it is important to engage with these questions while the field is at an early stage of evolution in order to make ethical awareness integral to its development.

Digital epidemiology, also referred to as digital disease detection (DDD), is motivated by the same objectives as traditional epidemiology. However, DDD focuses on electronic data sources that emerged with the advent of information technology [ 1 – 3 ]. It draws on developments such as the widespread availability of Internet access, the explosive growth in mobile devices, and online sharing platforms, which constantly generate vast amounts of data containing health-related information, even though they are not always collected with public health as an objective. Furthermore, this novel approach builds on the idea that information relevant to public health is now increasingly generated directly by the population through their use of online services, without their necessarily having engaged with the health care system [ 4 , 5 ]. By utilizing global real-time data, DDD promises accelerated disease outbreak detection, and examples of this enhanced timeliness in detection have already been reported in the literature. The most recent example is the 2014 Ebola virus outbreak in West Africa [ 6 ]. Reports of the emerging outbreak were detected by digital surveillance channels in advance of official reports. Furthermore, information gleaned by the various datasets can be used for several epidemiological purposes beyond early detection of disease outbreaks [ 7 , 8 ], such as the assessment of health behavior and attitudes [ 4 ] and pharmacovigilance [ 9 ].

Public health surveillance and public health research are governed by national and international legislation and guidelines. However, many of these norms were developed in response to very different historical conditions, including technologies that have now been superseded [ 20 ]. Such mechanisms may not be appropriate or effective in addressing the new ethical challenges posed by DDD, nor the questions that will be raised if DDD is effectively integrated into standard public health systems. Health research utilizing social media data and other online datasets has already exerted pressure on existing research governance procedures [ 21 ].

Big data cannot be readily grouped into clearly demarcated functional categories. Depending on how they are queried and combined with other datasets, a given dataset can traverse categories in unpredictable ways. For example, health data can now be extracted from our purchases of everyday goods, our social media exchanges, and our web searches. New data analytics constantly change the kinds of outcomes that become possible. They go beyond early identification of outbreaks and disease patterns to include predictions of the event’s trajectory or likelihood of reoccurrence [ 18 , 19 ]. These new possibilities render good data governance, which ensures their ethical use, all the more complex.

The amount of data that is generated from activities facilitated by the Internet and mobile technologies is unprecedented. The global number of mobile-cellular subscriptions is close to the world’s population figures, with a total penetration rate of 96%. The mobile-cellular penetration rate in developing countries is 89%, and about 40% of the world’s population is connected to the internet [ 13 ]. 82% of the world’s online population uses social media and networks. [ 14 ]. More than 40,000 health apps are available, and a new higher-level Internet domain name “health” is about to be released [ 15 , 16 ]. Not surprisingly, personal data have recently been described as a new asset class with the potential to, among other things, transform health care and global public health [ 17 ].

DDD operates at the intersection of personal information, public health, and information technologies, and increasingly within the so-called big data environment. Big data lacks a widely accepted definition. The term has, nevertheless, acquired substantial rhetorical power. We use it here in the sense of very large, complex, and versatile sets of data that are constantly evolving in terms of format and velocity [ 11 ]. This dynamic environment generates various ethical challenges that relate not only to the value of health for individuals and societies, but also to individual rights and other moral requirements. In order to spell out these challenges and possible ways of meeting them, it is necessary to take into account the distinctive nature of DDD and the broader context in which it operates. Generally, these distinct features are linked to the methods by which data are generated, the purposes for which they are collected and stored, the kind of information that is inferred by their analysis, and eventually how that information is translated into practice [ 12 ]. More specifically, some of these relevant features include those outlined below—namely, the steady growth of digital data, the multifaceted character of big data, and ethical oversight and governance.

Ethical Challenges

Against this background we have identified three clusters of ethical challenges facing DDD that require consideration (Table 1).

A. Context sensitivity At the crux of the debate on the ethics of big data lies a familiar, but formidably complex, question: how can big data be utilized for the common good whilst respecting individual rights and liberties, such as the right to privacy? What are the acceptable trade-offs between individual rights and the common good, and how do we determine the thresholds for such trade-offs? These ethical concerns and the tensions between them are not new to public health research and practice, but now they must be addressed in a new context, with the result that appropriate standards may vary according to the type of big data activity in question. It is clear that the context of DDD differs in significant ways from other types of big data activity concerned with health. DDD has a public health function, aiming ultimately to improve health at the population level. Public health is a common good from which all individuals benefit and one that is essential to human development and prosperity. There is a clear contrast here with forms of corporate activity that may use the exact same data (i.e., social networking data), but for other purposes, such as advertising. The former aims at fostering a public good (health); the latter at generating a corporate profit. Such differences have important ethical implications. A context-sensitive understanding of ethical obligations may reveal that some data uses that may not be acceptable within corporate activity (e.g., user profiling and data sharing with third parties) may be permissible for public health purposes. Furthermore, societal obligations to foster the common good of public health may generate duties on corporate data collectors to make data available for use in DDD. Pursuing this line of thought, it is arguable that privacy considerations that apply in standard public health practice will have to be creatively extended and adapted to the case of DDD. This will result in new standards that relate to data from a diverse range of sources, e.g., self-tracking, citizen scientists, social networks, volunteers, or other participatory contexts [22, 23]. Such new standards are urgently needed, especially as greater convergence of datasets becomes possible. An illustration of global activity on this front is the United Nations Global Pulse project [24]. This project explores the concept of data philanthropy whereby public–private partnerships are formed to share data for the public good. Such so-called data commons, operating on the basis of clear rules about privacy and codes of conduct, can profoundly affect disease surveillance and public health research more generally. Another dimension of context relates to global justice. Historically, new health tools have been predominantly used to improve the health of inhabitants of the better-off parts of the world. DDD projects that access global data are often less costly than traditional public health approaches. They could thus offer a potential breakthrough in early disease detection that would benefit communities throughout the world [25, 26]. However, this potential brings moral obligations in its train. This requires not only efforts to detect diseases in poorer parts of the world but also measures to ensure that the way data are collected and processed respect the rights and interests of people from these diverse regions and communities. This raises difficult questions of cultural relativity, such as whether standards of privacy can take different forms in relation to different cultures or whether some minimal core of uniform standards is also justified.

B. Nexus of ethics and methodology Robust scientific methodology involves the validation of algorithms, an understanding of confounding, filtering systems for noisy data, managing biases, the selection of appropriate data streams, and so on. Some have expressed skepticism about the role that DDD can play in public health practice given its early state of development [27]. In 2013, when Google Flu Trends overestimated flu prevalence levels in the US, further concerns were raised about the sensitivity of this methodology to the digital environments created by users’ behavior—for example, different uses of search terms [28] from those used to develop the initial algorithm or the distorting influence of searches arising from media coverage of the flu [29, 30]. Methodological robustness is an ethical, not just a scientific, requirement. This is not only because limited resources are wasted on producing defective results or because trust in scientific findings is undermined by misleading or inaccurate findings. There is a further risk of harm to individuals, businesses, or communities if they are falsely identified as affected by an infectious disease. The harm can take many forms, including financial losses, such as a tourist region being falsely identified as the location of a disease outbreak; stigmatization of particular communities, which may adversely affect individual members; and even the infringement of individual freedoms, such as the freedom of movement of an individual falsely identified as a carrier of a particular disease. The issue of data provenance comes within the remit of ethically sound methodology. Currently published DDD studies and other initiatives have mostly used data that are in the public domain (e.g., Twitter) or that have been contributed by individuals with their explicit consent for use in disease surveillance (flunearyou.org). While in principle data in the public domain are open to being used for public health purposes, what constitutes public domain on the Internet is the subject of lively debate [31]. Especially in the context of data derived from social network interactions, it remains unclear whether users understand in what ways their data can be used and who may access them [32]. Any DDD project will inevitably have to navigate this uncertain environment and so must exercise diligence about data provenance and exhibit transparency about its uses.