The Australian Census for 2016, which is run on 9 August this time around, is a little different to previous censuses. This year, the Australian Bureau of Statistics (ABS) has decided that name and address information will be retained longer than ever before. Instead of being kept for the roughly 18 months it takes to process all the forms and data before being destroyed, name and address data will instead be kept for up to four years.

A bunch of people aren’t overly happy with this plan, and here’s a sample of the coverage the issue has been getting in recent days:

There’s an awful lot of misinformation and scaremongering going on, and the Australian Bureau of Statistics (ABS) isn’t helping matters with its “Just trust us, okay?” approach to public relations. Here are my thoughts and links to information I’ve found in my own attempt to understand the situation.

Background

On 11 November 2015, the ABS published a Statement of Intent to conduct a Privacy Impact Assessment on the retention of names and addresses from responses to the 2016 Census on its website. This method of announcing the proposal is worth noting for later. The general tone of the Statement is also worth noting.

Almost no one noticed the Statement and the media alert the ABS sent out. There were two media mentions (as listed in the Privacy Impact Assessment itself): one in PSNews (Independent News for the Australian Public Service), and one in iTNews. Both are pretty vague on the privacy implications, and the piece in iTNews made no mention of the fact that the ABS was seeking input from the public, or that the deadline for responses was a mere three weeks away. (Disclosure: I write pieces for iTNews from time to time, and its sister publication CRN, and am a member of the iTNews advisory board)

The Privacy Impact Assessment itself provides a bit more detail on why the ABS wants to retain names and addresses:

The ABS is now considering the retention of names and addresses from the 2016 Census as a key enabler to meet the growing stakeholder demand to provide a richer and more dynamic statistical picture of Australia through the integration of Census data with other survey and administrative data, the geospatial enablement of that data, and improvements to our household surveys. The retention of names and addresses would also reduce the cost to taxpayers and the burden on Australian households through more efficient ABS survey operations.

The ABS enjoys a generally excellent reputation for both data collection and safeguarding, and also the production of high-quality statistical products. Australians have historically had a high degree of trust in the institution, and that trust has, so far at least, been well placed. I have personally used the ABS website and statistics from it from time to time, and it’s great stuff (if you’re into this sort of thing). The ABS even has data on what people think of them (of course they do!) and you can look at it here. I particularly recommend the summary tables here.

The law safeguarding Australians’ information collected in the Census is pretty robust, with the main statutory protections being in the Census And Statistics Act 1905, the Australian Bureau of Statistics Act 1975, and the Privacy Act 1988. The trend in legislation over the years has been to generally improve individuals’ right to privacy.

The ABS received a pretty high rating from the Australian National Audit Office Cyber Attacks: Securing Agencies’ ICT Systems audit (full report here) of various agencies in 2014. It’s a couple of years old, but it’s reasonable to expect that the situation would have improved, rather than gotten worse, with the level of focus information security has received in recent years and the value of the statistical dataset that the ABS looks after.

We can see that the ABS was planning to improve in its IT general controls, if not its implementation of the Top Four ISM strategies and related controls required for entering the Cyber Secure Zone.

Note that the ABS public statement on privacy here says (retrieved on 1 Aug 2016):

“The ABS took part in an Australian National Audit Office cross-agency audit in 2014 on information technology system security against cyber-attacks. The ABS was rated as being in a Cyber Secure Zone (having high-level protection from external attacks and internal breaches and disclosure of information).”

This is incorrect. As we saw in the figure included above, the ABS is in the Internally Secure zone, the requirement for which is defined as:

Reasonable level of protection from breaches and disclosures of information from internal sources—but vulnerabilities remain to attacks from external sources. (Source: ANAO Audit Report No. 50)

UPDATE

I have checked the ABS statement with ANAO, via email through an intermediary (I won’t name them unless/until they say it’s ok) and here is ANAO’s response:

Thank you for your clarification question referencing the ANAO Audit Report No.50 2013–14 Cyber Attacks: Securing Entities’ ICT Systems. In answer to your questions:

1. The ABS was not compliant with the PSPF and ISM—and was graded as being located in the Internally Secure Zone.

2. The ANAO has not conducted a follow up audit on the ABS since 2014; therefore I cannot validate your statement in your ZDNet article that “… the bureau now claims that it’s rated in the “Cyber Secure Zone”.

3. The audit methodology, grading scheme and ranking is created by the ANAO, and is shared with ASD prior to tabling the report in Parliament. The work conducted by ASD on behalf of government entities is not assessed by the ANAO. Our evidence is based on audit fieldwork and privileged access to artefacts and enterprise systems under the Auditor-General’s Act 1997. By way of background, the ANAO assessed seven entities in 2013-14: ABS, Customs, AFSA, ATO, DFAT, DHS, DHS and IP Australia. In summary, each of the auditees was locate in the Internally Secure Zone. The entities had security controls in place to provide a reasonable level of protection from breaches and disclosures of information from internal sources. The preferred state is for entities to be located in the Cyber Secure Zone. The selected auditees had not achieved compliance with the Protective Security Policy Framework (PSPF) and the Australian Government Information Security Manual (ISM). A copy of this report is available at: https://www.anao.gov.au/work/performance-audit/cyber-attacks-securing-agencies-ict-systems The ANAO recently published another cyber report titled, ANAO Audit Report No.37 2015-16 Cyber Resilience. Two of the four selected entities achieved compliance—AUSTRAC and the Department of Agriculture and Water Resources. Two entities did not achieve compliance—AFP and Department of Industry, Innovation and Science. A copy of this report is available at: https://www.anao.gov.au/work/performance-audit/cyber-resilience

The Value of Linking

It is true that having a way of directly linking data from the Census to other datasets, via name and address to determine if the information relates to the same person or not, would be very useful statistically. Without this information, it’s harder to know if changes in a given population are statistically valid. A simple example is given in introductory stats classes is a paired difference test where you survey the same group of people twice. It’s a special case of blocking.

Without name and address data, how can you tell that a given set of census data in 2011 matches the data from 2016? If a person’s income went from $45,000 to $149,000 in five years, but that result was because you linked two different people, then you get bad statistics if this is repeated across a large group. If you don’t do linkages, you can’t say as easily if there really was much of a change in incomes of two groups of people between one census and the next.

That’s a vast over-simplification, but you get the idea.

Knowing trend information like this is important for a host of public policy reasons. Everything from whether changes in health policy actually work, to knowing if the situation of those in vulnerable populations (the very poor, indigenous people, the elderly with chronic disease, etc.) is getting worse or better.

You can read some examples of what the ABS is able to do with this sort of data in Australians’ journey’s through life: Stories from the Australian Census Longitudinal Dataset, 2006-2011.

In short, there are plenty of perfectly good and reasonable reasons linking data from one census to another.

What’s the Big Deal?

This is where things get messy.

The creation of the Australian Census Longitudinal Dataset (ACLD) was first proposed back in 2005, and you can read the discussion paper about the issue: Discussion Paper: Enhancing the Population Census: Developing a Longitudinal View, 2006. Importantly, in the very beginning of this paper, the retention of names and addresses was specifically called out as not required.

Under this proposal the ABS would create a SLCD through combining census data over time. The proposal does not require the retention of names and addresses from the census. Names and addresses will continue to be destroyed following processing of the

2006 Census.

Name and address data would be used “during the period of census processing” but the proposal was quite specific about the purposes for which names and address would be used. There is also some detailed discussion of how individuals within the ABS would get access to data for linking, and the steps the ABS would take to guard against individuals being identified.

One way to bring the data together over time would be to use personal identifiers, such as name and address information, to bring together records for the same person. However, this would require the retention of name and address information from one census to the next. This raises significant privacy issues. The ABS is not proposing to retain name and address past the time needed to process each census and would not be using name and address information to bring the census data together over time. The ABS will continue its practice of destroying census forms containing names and addresses following census processing, as it has done in previous censuses. (Emphasis mine)

This is a detail that appears to have been missed in recent commentary about the decision to retain name and address information in the 2016 census. The creation of the ACLD in the first place was very well aware of the privacy and security implications from retaining names and addresses, and deliberately excluded them apart from very specific cases.

The Privacy Impact Assessment conducted to assess the 2005 proposal to create the ACLD went further, recommending that these specific name and address matching within the census processing time (which was estimated to be a maximum of 15 months, by the way) should be dropped.

Consideration should be given to abandonment of the Census Data Enhancement proposals involving name matching and of reverting to previous ABS practice of confining the use of names during Census processing periods to ABS quality studies only.

What Changed?

Rather than argue specifically about the merits of keeping names versus not, I’d like to look at what changed between 2005 and now. In 2005, the proposal was not to retain names and addresses any longer than normal processing, and the linkages were statistical aggregations for the most part, with some very specific name/address based linkages.

Now, however, the proposal is to simultaneously expand the retention of names and addresses from 18 months (not the 15 originally estimated) to 48 months and also to use them for a much larger range of linkages than what was proposed in 2005.

Why?

Unfortunately, the ABS has been rather vague and evasive about this issue. Unlike the 2005 discussion paper, which was detailed in its review of both the benefits and risks of the creation of the ACLD, the recent names and addresses proposal was, well, brief. My interpretation of the reasons for retaining names boils down to two major things.

Firstly, the ABS is under a lot of pressure to provide better quality data. Linking with more datasets that the ABS already has will allow it to provide more, higher value statistical products. Name and address data will make this much easier to do, and in some cases it might not be possible without name and/or address data.

Secondly, the ABS is simultaneously under a lot of pressure to cut costs. Easier equals cheaper.

Given these two pressures, I can understand why the ABS may want to look at keeping name and address data.

But in 2005, the ABS was well aware of the risks of retaining name and address data. Again, what has changed since then?

In that time, the information security landscape has changed substantially. Both the number and sophistication of malicious actors has increased, and while information security practices have improved, we are also far more aware of just how under-prepared many organisations are. However, the ABS is well practiced at keeping data safe, and has a long and proud history of doing so. It has, in short, been successful. So far.

However what isn’t clear is that the ABS is better at safeguarding data than it was in 2005. If the risk has increased (which it has, as we just discussed), then the ABS’ ability to mitigate the risks must have increased substantially more. The ABS must overcome the risk inflation we’ve established is real, and must also add in the additional risk created by retaining name and address data. This data is attractive all by itself, but even more so because of its ability to link datasets, which is the very reason for keeping it. What the ABS might find useful would also be useful to whomever might steal the data.

There’s another change that has been hinted at by various commenters.

When the 2005 proposal was made, Facebook was barely a year old. Now, in 2016, Australians have had more than a decade of getting used to the idea of sharing private information with strangers through the Internet. The argument is made that people have become so used to giving up their privacy by carrying mobile phones everywhere (which function very nicely as tracking devices) and posting selfies on Instagram.

Attitudes to privacy may well have shifted. Alas, that’s a set of data I don’t have at my fingertips (happy to update this blog if someone tips me off to some high quality survey data). And that appears to be at least part of the reason for the ABS taking the approach it has.

Hubris Born of Success

The recent opinion piece in The Sydney Morning Herald by ABS Chief Statistician David Kalisch (Give us your name on census night, it’ll be safe) strikes a somewhat odd note to me. Its feel a tad preemptively defensive, and there are some weasel words in it.

Australians have no cause for concern about any aspect of this census, and can have ongoing trust and confidence in the ABS.

This statement is quite early on in the piece. Why state this? If I didn’t have any cause for concern, I certainly do now. This is a bit “Pay no attention to that man behind the curtain” for me.

In 2016, I’ve decided to keep names and addresses for longer. This is for statistical purposes only, and will increase the value of census data.

That’s nice. But at what cost?

Note that Kalisch uses the phrase “I’ve decided”, rather than “the ABS has decided.” That choice of phrase bothers me, as it suggests that a great deal of power rests in the hands of a single person, who can simply decide things. On the plus side, since My Kalisch has so clearly nailed himself to this particular plank, should this decision prove to be a poor one, there is no one else to blame.

My decision followed community consultation, direct engagement with the Australian Information Commissioner and each State and Territory Privacy Commissioner, and a Privacy Impact Assessment (PIA). The ABS has transparently communicated its process and decisions every step of the way. We advertised our PIA process in the national media in November 2015 and received few responses.

This is technically true. However, the Privacy Impact Assessment doesn’t appear to be independent in the way the PIA from 2005 was. The author in the PDF is listed as Zoe Winston-Gregson, who appears to be a Graduate Development Program hire according to this document [PDF]. It would appear that the ABS performed the PIA itself. While the Office of the Australian Information Commissioner’s (OAIC) Guide to Undertaking Privacy Impact Assessments does not require a PIA to be conducted by an external party, I would be interested to know why the ABS chose not to have an external third party conduct this assessment when it saw value in an externally conducted PIA in 2005.

The 2016 PIA states:

The Privacy Impact Assessment identified a small number of potential risks to personal privacy associated with the retention of names and addresses from responses to the 2016 Census, but concluded that in each case the likelihood of these risks eventuating was ‘very low’. The Privacy Impact Assessment determined that these risks can and would be effectively mitigated by implementation of an internationally accepted practice known as functional separation and by existing ABS governance and security arrangements. Nevertheless, a small number of recommendations have been made in relation to implementation of the proposal.

Throughout the document, the emphasis is on risk likelihood rather than risk impact, and both are required for a proper understanding of risk; a low likelihood of losing $2 is very different from a low likelihood of losing your life. All of the risks are assessed as being of Very Low likelihood, but there is no corresponding scale for the impact of the risk, should it occur.

In section 4 Privacy Risk and Mitigation, where the risks are enumerated (all five of them) there is a statement describing the consequence of the risk eventuating, but there appears to be no assessment of the magnitude of its impact. The emphasis is on the ABS’ ability to mitigate the likelihood of the risk occurring, of which it appears to be supremely confident.

Risk 4.3 “Accidental release of name and/or address data in ABS outputs or through loss of work related IT equipment and IT documentation” has a consequence listed as “Name and/or address information is publically[sic] released.” Well yes, but is that it?

What about the impact to the people whose name and address information is released? What about the impact to the reputation of the ABS, who have now spent a lot of time and energy hyping themselves as superior custodians of this name and address information?

And this is listed as a separate risk from risk 4.2 “Unauthorised non-ABS access to data stored in the ABS environment” where the consequence is “The consequences of breach of privacy depend on whether names, anonymised names, or linked data is accessed.” There’s no discussion of the consequence of more than one of these things happening at the same time.

In my view, this PIA is nowhere near as thorough as the one performed in 2005 by Pacific Privacy Consulting, i.e. Mr. Nigel Waters. It is rather light-on for an issue that has been addressed multiple times in the past with great caution and care. I do not see the same level of care and diligence here.

My Conclusion

In deciding to retain name and address data, the ABS (or David Kalisch all by himself, who knows?) believes that it is trusted by the public, and that people aren’t all that concerned about providing this information to the ABS. It is under pressure from the government to cut costs and to provide more higher value products and services. Sound familiar?

I think the ABS, or whomever was involved in driving the retention of names and addresses, has become arrogant. They are convinced of the benefits of retention, and are dismissive of the risks. They are concerned about people’s perception of the issue but only superficially. They are not interested in carefully explaining their reasoning and all the safeguards that have been put in place and demonstrating very clearly to all who will listen why they should be trusted. The ABS has assumed they are trusted and worked from there.

The way the issue has been handled looks to me like one that was carefully stage-managed so as not to spook the horses. A series of process boxes have been checked. Focus groups to figure out how concerned people really were, which showed up as “well, not all that much” That’s just a qualitative step that should have been followed by a quantitative study of some kind, of which I have yet to see evidence. The ABS published a short Statement of Intent with vague, hand-wavey statements about how the ABS would take care of everything, because they’ve always managed really well before, and put it out quietly as people were leaning into the Christmas period. People were given a mere three weeks to respond, if you were to somehow stumble across the idea that you could, and the lack of responses was used as more evidence that no one was bothered by the proposal.

Absence of evidence is not evidence of absence, as the statisticians at the ABS well know, and as the architects of this situation are now finding out, no doubt to their chagrin.

Now we have anodyne corporatised PR management of the issue which boils down to “Trust us, and don’t worry your pretty little head about any pesky details.”

I believe that the ABS is probably better equipped than most government departments to safeguard this information, but it’s a lot easier to keep a secret if you don’t know it in the first place. I remain unconvinced that the benefits of retaining name and address data outweigh the risks. I haven’t seen a good enough explanation from the ABS on this issue, and they’ve had ample time to provide one, but seem incapable of engaging with the public in a non-condescending way.

This needs to stop immediately.

Postscript

Risk 4.4 in the 2016 PIA is

RISK:Reduction in participation levels in ABS collections due to loss of public trust

Consequence: The proposal to retain names and addresses from responses to the Census may cause public concern which results in a reduction of participation levels in ABS collections, and/or a public backlash.

Management if risk eventuates: Depending on the circumstances, the ABS will:

Respond to concern from the media, stakeholders and the public;

Conduct further consultations;

Reconsider the privacy design for the proposal, if required.

Let’s keep making noise.