The National Security Agency has announced a startling failure in the implementation of the USA Freedom Act of 2015. According to a public statement released by NSA on June 28, the call detail records that NSA has been receiving from telephone companies under the Act are infected with errors, NSA cannot isolate and correct those errors, and so it has decided to purge from its data repositories all of the CDRs ever received under the Act. As the public statement explains, “on May 23, 2018, NSA began deleting all call detail records (CDRs) acquired since 2015 under Title V of the Foreign Intelligence Surveillance Act (FISA) ... because several months ago NSA analysts noted technical irregularities in some data received from telecommunications service providers. These irregularities also resulted in the production to NSA of some CDRs that NSA was not authorized to receive. Because it was infeasible to identify and isolate properly produced data, NSA concluded that it should not use any of the CDRs.”

This post (1) reviews the program of bulk collection of telephony metadata that existed prior to the USA Freedom Act, (2) describes the processes required under the Freedom Act, and (3) considers lessons learned from the failure of the Act.

I. Bulk Telephony Metadata Collection, 2006–15

From 2006 until the USA Freedom Act of 2015, NSA engaged in bulk collection of telephony metadata under the supervision of the Foreign Intelligence Surveillance Court (FISC). As I have explained here, NSA would collect vast quantities of telephony metadata, or records about telephone calls—sometimes referred to as “call detail records” or “CDRs”—from certain telephone companies. These records included information about the telephone numbers involved in a call and the date and time of the call, but they did not include the words spoken in the call. The information was similar to what used to appear in an old-fashioned paper telephone bill—it might show, for example, that (123) 456-7890 called (234) 567-8901 on Monday, Jan. 1, at noon, for 30 minutes. It operated on a truly enormous scale: NSA would collect this sort of information from telephone companies and store it in huge data repositories.

From time to time and under rules approved by the FISC, NSA would query the stored data. A query would begin with a “seed” telephone number, later referred to as a “specific selection term,” as to which NSA had reasonable, articulable suspicion that it was being used in connection with international terrorism. The query would return information on all of the calls in the database that were made to or from that seed number. The set of telephone numbers involved in those calls were referred to as being “one hop” away from the seed number (because they were in direct contact with the seed number). The query would then also return information on all of the calls in the database that were made to or from the one-hop numbers. The set of telephone numbers involved in those calls would be referred to as being “two hops” away from the seed number. The query would then sometimes return the same information on the two-hop numbers, finding connections up to three hops away from the seed.

The key features of this system, as compared to what came later, were that NSA (1) ingested and retained all of the call-detail records (telephony metadata) from the participating telephone companies; and (2) conducted the analysis showing connections between numbers—referred to as “contact chaining”—by itself.

II. The USA Freedom Act, 2015-Present

The USA Freedom Act ended the bulk collection of telephony metadata and replaced it with a new procedure under which NSA sent queries to the telephone companies and received from them the responsive information. Details aside, the Act therefore changed both of the distinguishing features of the prior program. First, NSA would no longer ingest and store all of the CDRs, but only the responsive one-hop and two-hop records it received from the telephone companies in response to queries. Second, as part of this approach, the contact chaining necessary to determine the one-hop and two-hop numbers for a query would be done by the telephone companies, not by NSA. Contact chaining had to be done by the telephone companies, of course, because NSA no longer had the full set of CDRs. This was the key privacy-enhancing feature of the USA Freedom Act—it radically reduced the raw amount of metadata held by the government.

Indeed, under the Freedom Act no single entity possessed all of the records, as each telephone company retained only its own CDRs, mainly concerning its own subscribers. This required a more complex, iterative querying process to capture cases in which one company’s subscriber called another’s. It also likely cost millions of dollars in the form of reimbursements to the telephone companies as compared to what NSA would have spent to do the work itself.

Here is the House Judiciary Committee’s description of how the process was intended to work under the USA Freedom Act:

The government may require the production of up to two “hops”—i.e., the call detail records associated with the initial seed telephone number and the call detail records (CDRs) associated with the CDRs identified in an initial “hop.” [The law] provides that the government can obtain the first set of CDRs using the specific selection term approved by the FISC. In addition, the government can use the FISC-approved specific selection term to identify CDRs from metadata it already lawfully possesses. Together, the CDRs produced by the phone companies and those identified independently by the government constitute the first “hop.” Under [the law], the government can then present session identifying information or calling card numbers (which are components of a CDR . . .) identified in the first “hop” CDRs to phone companies to serve as the basis for companies to return the second “hop” of CDRs.

Here is an example drawn from NSA’s published description of how the USA Freedom Act works in practice:

To illustrate the process, assume an NSA intelligence analyst identifies or learns that phone number (202) 555-1234 is being used by a suspected international terrorist. This is the “specific selection term” or “selector” [seed] that will be submitted to the FISC (or the Attorney General in an emergency) for approval using the RAS [reasonable articulable suspicion] standard. Also assume that, through NSA’s examination of metadata produced by the provider(s) or in NSA’s possession as a result of the Agency’s otherwise lawfully permitted signals intelligence activities (e.g., activities conducted pursuant to Section 1.7(c)(1) of Executive Order 12333, as amended), NSA determines that the suspected terrorist has used a 202 area code phone number to call (301) 555-4321. The phone number with the 301 area code is a “first-hop” result. In turn, assume that further analysis or production from the provider(s) reveals (301) 555-4321 was used to call (410) 555-5678. The number with the 410 area code is a “second-hop” result.

Once the one-hop results are retrieved from the NSA’s internal holdings, the list of FISC-approved specific selection terms, along with NSA’s internal one-hop results, are submitted to the provider(s). The provider(s) respond to the request based on the data within their holdings with CDRs that contain FISC-approved specific selection terms or the one-hop selection term. One-hop returns from providers are placed in NSA’s holdings and become part of subsequent query requests, which are executed on a periodic basis.

The NSA’s description includes this image depicting the high-level architecture of how CDRs are collected under the Act:

Some important complexity arises inside the yellow “PROVIDER(S)” cylinder at the lower left corner of the center box, reflecting the change from a unified to a federated model, with each telephone company representing a separate, external repository. To illustrate the complexity, assume a CDR program in which three providers are participating—P1, P2 and P3. By and large, each provider maintains CDRs on its own customers (subscribers), but not the customers of the other two providers. Accordingly, to obtain the one-hop results—numbers in direct contact with the seed number—NSA must send identical queries to P1, P2 and P3 as well as to its own, internal databases containing information obtained from other surveillance programs. Each provider will return certain information that NSA must then aggregate with its own information to generate the list of one-hop numbers. This information then becomes a query sent back to each of P1, P2 and P3 as well as NSA’s own databases. The information returned by the three providers and NSA’s internal databases, once aggregated and normalized, is the two-hop information. If the seed number is a P1 customer (subscriber) who called several P1, P2 and P3 customers, who in turn called several P1, P2 and P3 customers, as well as customers of other telephone companies, the required processes may be complex.

Somewhere in there, we now know, something went wrong. All of the data obtained by NSA under the Act are useless and will be destroyed. There is some problem that apparently infects at least some of the data—presumably in the form of inaccurate connections between telephone numbers—as well as some overproduction of data, and NSA cannot distinguish the good data from the bad. We don’t know exactly how this happened but given that NSA cannot repair the damage, its explanation that the telephone companies provided bad data is very plausible. And, because the companies do not always retain raw data for very long, there must be problems reconstructing queries from the past. NSA has apparently discovered the “root cause” of this infection and it can be fixed going forward. Even if this is the case, however, we have had three years of darkness, and we have to wonder if another problem may arise.

III. Lessons for the Future

What are the lessons here? The obvious one is probably that Murphy’s Law remains in force. And that law is particularly powerful as applied to large, complex systems. Sometimes, these systems generate mistakes that threaten privacy. Sometimes they generate mistakes that threaten security. The more complex the system—legally or technologically—the more likely that it will yield errors of both types.

The USA Freedom Act created a more complex legal system requiring a more complex technological system governing collection of telephony metadata. This system failed. The failure has been discovered and apparently remediated. But I am left wondering whether another error could arise, whether the system is too complex to be sustainable, and therefore whether the juice is worth the squeeze, particularly after Carpenter. We should know the answer to that question soon: under Section 705 of the USA Freedom Act, the CDR process is scheduled to sunset, unless renewed, at the end of 2019, and it will be very interesting to see whether the executive branch even seeks renewal.