Researchers say they can identify individuals in government healthcare data that was supposed to be anonymous, potentially exposing the private information of people receiving mental health treatment or HIV medication.

Published by the federal Department of Health in August 2016 as part of a move towards open data, the historical information included the medical and pharmaceutical bills of about 10 per cent of Australians.

It was pulled offline last year after experts from the University of Melbourne were able to decrypt or decode a number of doctor ID numbers.

In a new report released today, the Melbourne team describe how they were able to use information easily available on the internet to possibly identify seven famous Australians within the same dataset, without figuring out their patient ID numbers.

The report's co-author Dr Vanessa Teague said being able to expose this kind of patient data could be significant for those with stigmatising or chronic illness.

"The possibility of having that fact exposed might have very serious consequences," she said.

A Department of Health spokesperson said the matter had already been referred to the Privacy Commissioner, who is investigating, and the department had taken steps to improve its processes.

"The project was halted and remains halted, and the dataset was removed immediately," she said.

"The department has not been aware of anyone being identified."

How to find someone in anonymous data

Even though the Department of Health removed names and took other steps to ensure the dataset was anonymous, the researchers were able to identify individual patients using the type of information people might share on Facebook: gender, birth year, state and health events.

"Two or three boring, typical facts about a person, or data points about a person, will very often make them unique," Dr Teague explained.

To illustrate, Dr Teague looked for herself in the data.

More than 17,000 women in the dataset matched her year of birth, but when the years of birth of two of her children were added, only 59 possible matches remained. And only 23 in her home state of Victoria.

Adding their specific days of birth brought the possible matches to zero.

To find the unique patient records that potentially matched the seven prominent Australians, the researchers simply examined online news reports and Wikipedia stories that described celebrity births, sport injuries or surgeries.

Dr Trent Yarwood, a member of the technology-focused policy organisation Future Wise, said the researchers' results were not surprising.

"When you combine an [anonymised dataset] with other publicly accessible data like people's social media or other available demographic data, it means that you can use this stuff to individually identify people."

Open data versus privacy

The report demonstrates the difficult trade-off between protecting patient privacy and making data openly accessible for important public research.

"I think what we have learnt from this is posting a detailed individual record in a de-identified way just doesn't work, and it would be a lot better to separate out the data that's intended for researchers from public statistics," Dr Teague said.

Dr Teague pointed to the Productivity Commission's proposal that would give sensitive data to "trusted users," like a medical research body, with the strong expectation that they keep it secure.

This is different from open datasets about the function of government — say, government expenditure, donations and gifts.

"People's medical records are not really government data," Dr Teague suggested. "They're information about individuals that is in the care of the Government."

In 2016, the Government proposed making the re-identification of government data an offence, but Dr Teague believes that is misguided and could affect legitimate research.

"I think that's confusing the symptom with the disease," she said. "In fact, if the Government has posted an easily re-identifiable dataset, they should find out."

"This paper's really important at highlighting that even what on paper looks like 'reasonable' de-identification steps can actually cause problems," Dr Yarwood said.

"If computer security researchers are doing this, then you can bet that advertising companies ... are going to be doing exactly the same sort of stuff."

The Office of the Australian Information Commissioner was unable to comment as its investigation is ongoing.