Confusion over data anonymization and privacy can have serious consequences when sensitive medical data are being collected for research. Anonymity cannot be achieved merely by dispensing with direct identifiers (see N. Seeman Nature 573, 34; 2019).

People are identifiable in large data sets even in the absence of personal information (L. Sweeney J. Law Med. Ethics 25, 98–110; 1997). For example, a few attributes such as demographic information can uniquely identify 99.98% of US subjects in any dataset (L. Rocher et al. Nature Commun. 10, 3069; 2019). That is why recital 26 of the European Union’s General Data Protection Regulation and section 1798.140 (h) of the California Consumer Privacy Act consider data as anonymous only when the subject cannot be re-identified.

Health research needs access to patient data to determine the precise patterns of signs and symptoms that indicate the onset of disease, and to monitor how these change in response to treatment. Because the mere absence of obvious identifiers does not protect privacy, it is imperative that such data continue to be collected, accessed and processed with caution and with strict security measures in place.