The promise that your census data will remain private may be more difficult to keep than the Australian Bureau of Statistics (ABS) had realised, after a flaw was revealed in an algorithm designed to protect your privacy — meaning your census data could be exposed.

Key points: Researchers discover a flaw in an algorithm used to protect the privacy of census data

Researchers discover a flaw in an algorithm used to protect the privacy of census data No privacy data has been compromised, according to the Australian Bureau of Statistics

No privacy data has been compromised, according to the Australian Bureau of Statistics Data may still be at risk, according to researchers, and a "very different" approach should be taken to protecting such sensitive information

The flaw is so bad, according to Dali Kaafar, chief scientist at the Optus Macquarie University Cyber Security Hub, that he believes a sophisticated attacker could re-construct (and reveal) large parts of the census dataset.

But for a targeted attacker trying to reveal a specific person's details, a high level of sophistication may not be required, with Professor Kaafar describing the attack as "not only practical but very, very simple".

"[We've] been really surprised that the attack was actually that easy to implement," he told the ABC.

What's the risk?

The risk to individual privacy and the possibility that large parts of the census dataset could be reconstructed is so great, according to Professor Kaafar, that he believes the ABS should have strongly considered taking TableBuilder, a web-based tool for accessing census data, offline until better privacy protection techniques were implemented.

The technique developed by the researchers involves being able to calculate (and then remove) deliberate inaccuracies introduced by an ABS algorithm designed to prevent the possibility of identifying individual people in census data.

"You can automate that to actually run this whole vulnerability for the whole Australian population and then we can reconstruct completely the whole database," he said.

The researchers warned the ABS in 2017 that they believed there was a problem with the algorithm.

In mid-2018 Professor Kaafar communicated the results of further research, published last week, to the ABS, which appears to have prompted changes to the TableBuilder tool. However, the ABS declined to answer the ABC's specific questions about when they were first made aware of the vulnerability and when their changes were implemented.

The research shows, with mathematical certainty, that the technique the researchers developed makes the census data vulnerable due to the way TableBuilder works.

Daniel Angus, Associate Professor of Digital Communication at Queensland University of Technology, agreed that the vulnerability had the potential to "identify a singular unique individual in the data", and said that represented a serious privacy risk.

"From that, a few skip-tracing techniques could be used to connect this unique [individual record] to a real-world identity," he said.

What does the ABS say?

The ABS acknowledged the vulnerability in a statement, saying the attacks were "theoretically possible".

But it played down the significance of the issue and declined to answer more detailed questions put to it by the ABC.

"ABS regularly tests and updates the protections of ABS TableBuilder, to ensure the adequacy of the privacy protections," the statement read.

The ABS described the technique developed by the researchers as "a very elaborate and obscure attack scenario" and claimed that "[n]o-one's privacy has ever been compromised through the use of the ABS TableBuilder tool".

The bureau appears to have considered the issue serious enough to warrant changes, saying "[t]he ABS has since put in place measures to mitigate this potential vulnerability suggested by the researchers".

However, Professor Kaafar remains concerned about the adequacy of these new protections.

"It's definitely a step in the right direction," he said. "But it's actually very far from addressing the whole threat or the whole vulnerability."

He describes the mitigations as "completely useless" against more sophisticated attackers such as state-level actors.

How exactly does it work?

One way the ABS uses the census data is by making summaries of it available to researchers, journalists and the general public through a tool called TableBuilder.

TableBuilder lets users query the tool to get the number of people who fall into particular categories. It's how journalists, for example, calculate how the proportion of 18- to 34-year-olds living at home has changed over time for stories such as: How life has changed for people your age.

But to protect our privacy, the numbers provided in TableBuilder aren't 100 per cent accurate. The ABS adds "noise" to the data using an algorithm they developed.

The paper published by Professor Kaafar and Hassan Jameel Asghar shows how it is possible to remove that noise (and therefore, the privacy protection it provides) revealing the true numbers from TableBuilder results. This, in turn, means that with a little work it is possible to identify specific people in the data.

Is a different approach possible?

Collecting and releasing data as sensitive as that found in a census comes with inherent risks to privacy. There is a trade-off between how useful the data is and how much risk is involved in its use.

This kind of issue is not unique to Australia and algorithms similar to the ABS's are developed and used by other national statistics bureaus. However some are beginning to take a different approach — a path Professor Kaafar believes the ABS should take with some urgency.

"This is something that we're really trying to communicate to the ABS — since we communicated this vulnerability to them — is the need to address the protection of such sensitive data in a very different way," he said.

Where algorithms like the one used by the ABS don't specifically quantify the risk involved, according to Professor Kaafar, there are other types of algorithms which can provide more mathematical certainty about the privacy protections they offer.

The United States Census Bureau, for example, has committed to using a technique known as "differential privacy" which can provide that mathematical certainty about the protections it provides.

In January, the ABS published a note on their methodology which shows they are considering differential privacy techniques.

However, they have not yet committed to implementing them, saying they also need to consider the "practical implementation, social licence and stakeholder acceptance, and the broader confidentiality protections / legislative requirements".

Editor's Note: A previous version of this story stated it may be possible to re-create the entire census dataset by exploiting the vulnerability described. The story has been amended to reflect that this is not the case but instead that large parts of the dataset are at risk.