The British Public & its Freedom to Tinker with Data

This is a guest blog by Alec Muffett, a security researcher and member of ORG’s Board of Directors.

Britain – after a somewhat shaky start – has recognised its proud tradition of leadership in cryptographic research and development.

From Alan Turing’s success at breaking the Enigma cipher at Bletchley Park, and Tommy Flowers’ “Colossus” (also there) to break the Lorenz cipher, to early and secret research into what later became known as “Public Key Encryption” by Clifford Cox, to GCHQ’s vast deployment of technology to enable mass-surveillance of undersea cable communications — whatever one’s opinion of the fruits of the work, Britain is recognised as a world leader in the fields of cryptography.

And one of the great truths of cryptography is: cryptography only improves when people are trying to break it. From academics to crossword-puzzle fans, cryptography does not evolve unless people are permitted to attack its means, methods and mechanisms.

This brings us to the recently announced “Data Protection Bill”, in which you will find a well-intentioned paragraph: (our emphasis)

Create a new offence of intentionally or recklessly re-identifying individuals from anonymised or pseudonymised data. Offenders who knowingly handle or process such data will also be guilty of an offence. The maximum penalty would be an unlimited fine. https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/635900/2017-08-07_DP_Bill_-_Statement_of_Intent.pdf (page 10)

This speaks to the matter of “data anonymisation” (and the reverse processes of re-identification or de-anonymisation) where the intention is that some database — for instance a local hospital admissions list — could be stripped of patient names and yet still usefully processed/shared to research the prevalences of disease in a population.

Done improperly, this can go wrong:

https://en.wikipedia.org/wiki/AOL_search_data_leak

Search queries, published for research, are widely deanonymised

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

Anonymised “movie ratings” are re-attributed to the Netflix viewer

https://www.newscientist.com/article/dn25088-nhs-plans-leave-anonymous-medical-data-vulnerable/

Critique of NHS proposed “anonymisation” by Prof Ross Anderson at Cambridge

…leading to failures where the “anonymity” can be defeated by combining several data sources, or by attacking the data set analytically, in order to return some semblance of the original data.

Ban it?

So it might sound like a good idea to ban re-identification, yes?

Well, no; the techniques of data anonymisation are mostly a form of “code book” cryptography, and (as above) if it’s not legal to prod, poke, and try to break the mechanisms of anonymisation, then anonymisation, like cryptography, will not improve.

Therefore: banning re-identification will harm all of our individual security; it should be explicitly legal for anyone — the professionals, the crossword-puzzlers — to “have a go” at re-identification of data. Certainly it should be illegal for anyone to exploit or share the fruits of any successful re-identification — as is currently suggested — but the act of re-identification itself should not be prevented nor chilled in any way.

To swap metaphors: if you drive a car in the UK then it will have been crash-tested by experts in order to determine how safe it is; but that is not sufficient. We do not rely upon experts to crash them once, declare them safe, and then ban members of the public from crashing their cars. Instead, much of our learning and standards in car safety are from analysing actual, real-world incidents.

Similarly: anonymisation is hard to do correctly, and the failures in how people and organisations have deployed it will only be evident if the many eyes of the general public are permitted to dig into the flaws that may have arisen from one example to the next. It will not be sufficient, as this bill announcement continues, for “…the important role of journalists and whistleblowers […to…] be protected by exemptions.“

Everyone has a stake in the collective security of our information, and we — the public — are the code-breakers who should be able to research, and hold to account, any instances of diverse and shoddy anonymisation that may be foisted upon us. Therefore this bill proposal must be amended and the freedom of the public to attempt re-identification must not be abridged.

— Alec Muffett, security researcher & member of the Board of Directors, ORG

Further reading

https://en.wikipedia.org/wiki/Bombe

https://en.wikipedia.org/wiki/Tommy_Flowers

https://en.wikipedia.org/wiki/Clifford_Cocks

https://en.wikipedia.org/wiki/Tempora

https://en.wikipedia.org/wiki/Cryptanalysis

https://en.wikipedia.org/wiki/De-anonymization

https://en.wikipedia.org/wiki/Data_Re-Identification

https://en.wikipedia.org/wiki/Euro_NCAP

http://www.theregister.co.uk/2017/08/07/data_protection_bill_draft/