According to the Washington Post, on a single day last year, the NSA's Special Source Operations branch collected 444,743 email address books from Yahoo!, 105,068 from Hotmail, 82,857 from Facebook, 33,697 from Gmail and 22,881 from unspecified other providers. The figures are contained in an internal top secret NSA PowerPoint presentation provided by former NSA contractor Edward Snowden. The "typical daily intake" corresponds to a rate of more than 250 million address books per year. The newspaper claims Australia's NSA counterpart – the Defence Signals Directorate (now the Australian Signals Directorate) – collected 311,113 address books as part of the program on a single day, naming it as the designated "DS" code in the leaked file. Another code, AUC, also attributed to Australia, appears in another document. Canberra has yet to confirm Australia is the culprit, and the author of the report has himself left open the possibility of the DS prefix belonging to another country, however Australia has already been revealed to have close ties with US and other international intelligence agencies. In late August, for example, Fairfax Media revealed that the Australian Signals Directorate was in a partnership with British, American and Singaporean intelligence agencies to tap undersea fibre-optic telecommunications cables that link Asia, the Middle East and Europe and carry much of Australia's international phone and internet traffic. Each day, the presentation said, the NSA collects contacts from an estimated 500,000 buddy lists on live-chat services as well as from the "inbox" displays of web-based email accounts. Washington Post reporter Barton Gellman, who helped write the Post report, said he was able to identify "DS" as being an Australian intelligence agency using "internal evidence plus one source".

The collection depends on secret arrangements with foreign telecommunications companies or allied intelligence services in control of facilities that direct traffic along the internet's main data routes. Although the collection takes place overseas, two senior US intelligence officials acknowledged that it sweeps in the contacts of many Americans. They declined to offer an estimate but did not dispute that the number is likely to be in the millions or tens of millions. A spokesman for the US Office of the Director of National Intelligence, which oversees the NSA, said the agency "is focused on discovering and developing intelligence about valid foreign intelligence targets like terrorists, human traffickers and drug smugglers. We are not interested in personal information about ordinary Americans". The spokesman, Shawn Turner, added that rules approved by the US attorney general require the NSA to "minimise the acquisition, use, and dissemination" of information that identifies a US citizen or permanent resident. The NSA's collection of nearly all US call records, under a separate program, has generated significant controversy since it was revealed in June. The NSA's director, General Keith Alexander, has defended "bulk" collection as an essential counterterrorism and foreign intelligence tool, saying "you need the haystack to find the needle".

Contact lists stored online provide the NSA with far richer sources of data than call records alone. Address books commonly include not only names and email addresses but also telephone numbers, street addresses, and business and family information. Inbox listings of email accounts stored in the "cloud" sometimes contain content such as the first few lines of a message. Taken together, the data would enable the NSA, if permitted, to draw detailed maps of a person's life, as told by personal, professional, political and religious connections. The picture can also be misleading, creating false "associations" with ex-spouses or people with whom an account holder has had no contact in many years. The NSA has not been authorised by US Congress or the special intelligence court that oversees foreign surveillance to collect contact lists in bulk, and senior intelligence officials said it would be illegal to do so from facilities in the United States. The agency avoids the restrictions in the US Foreign Intelligence Surveillance Act by intercepting contact lists from access points "all over the world", one official said, speaking on the condition of anonymity to discuss a classified program. "None of those are on US territory." Because of the method employed, the agency is not legally required or technically able to restrict its intake to contact lists belonging to specified foreign intelligence targets, he said. When information passes through "the overseas collection apparatus", the official added, "the assumption is you're not a US person."

In practice, data from Americans is collected in large volumes – in part because they live and work overseas, but also because data crosses international boundaries even when its American owners stay at home. Large technology companies, including Google and Facebook, maintain data centres around the world to balance loads on their servers and work around outages. A senior US intelligence official said that the privacy of Americans is protected, despite mass collection, because "we have checks and balances built into our tools". NSA analysts, he said, may not search or distribute information from the contacts database unless they can "make the case that something in there is a valid foreign intelligence target in and of itself". In this program, the NSA is obliged to make that case only to itself or others in the executive branch. With few exceptions, intelligence operations overseas fall solely within the US president's legal purview. The Foreign Intelligence Surveillance Act, enacted in 1978, imposes restrictions only on electronic surveillance that targets Americans or takes place on US territory. By contrast, the NSA draws on authority in the Patriot Act for its bulk collection of domestic phone records, and it gathers online records from US internet companies, in a program known as PRISM, under powers granted by Congress in the FISA Amendments Act. Those operations are overseen by the Foreign Intelligence Surveillance Court.

Senator Dianne Feinstein, chairman of the Senate Intelligence Committee, said in August that the committee has less information about, and conducts less oversight of, intelligence-gathering that relies solely on presidential authority. She said she planned to ask for more briefings on those programs. "In general, the committee is far less aware of operations conducted under 12333," said a senior committee staff member, referring to Executive Order 12333, which defines the basic powers and responsibilities of the intelligence agencies. "I believe the NSA would answer questions if we asked them, and if we knew to ask them, but it would not routinely report these things, and in general they would not fall within the focus of the committee." Because the agency captures contact lists "on the fly" as they cross major internet switches, rather than "at rest" on computer servers, the NSA has no need to notify the US companies that host the information or to ask for help from them. "We have neither knowledge nor participation in any mass collection of web mail addresses or chat lists by the government," said Google spokesman Niki Fenwick. At Microsoft, spokesman Nicole Miller said the company "does not provide any government with direct or unfettered access to our customers' data", adding that "we would have significant concerns if these allegations about government actions are true".

Facebook spokesman Jodi Seth said "we did not know and did not assist" in the NSA's interception of contact lists. It is unclear why the NSA collects more than twice as many address books from Yahoo! than the other big services combined. One possibility is that Yahoo!, unlike other service providers, has left connections to its users unencrypted by default. Suzanne Philion, a Yahoo! spokesman, said on Monday in response to an inquiry from The Washington Post that, beginning in January, Yahoo! would begin encrypting all its email connections. Google was the first to secure all its email connections, turning on "SSL encryption" globally in 2010. People with inside knowledge said the move was intended in part to thwart large-scale collection of its users' information by the NSA and other intelligence agencies. The volume of NSA contacts collection is so high that it has occasionally threatened to overwhelm storage repositories, forcing the agency to halt its intake with "emergency detasking" orders. Three NSA documents describe short-term efforts to build an "across-the-board technology throttle for truly heinous data" and longer-term efforts to filter out information that the NSA does not need.

Spam has proven to be a significant problem for NSA – clogging databases with data that holds no foreign intelligence value. The majority of all emails, one NSA document says, "are SPAM from 'fake' addresses and never 'delivered' to targets." In late 2011, according to an NSA presentation, the Yahoo! account of an Iranian target was "hacked by an unknown actor", who used it to send spam. The Iranian had "a number of Yahoo! groups in his/her contact list, some with many hundreds or thousands of members". The cascading effects of repeated spam messages, compounded by the automatic addition of the Iranian's contacts to other people's address books, led to a massive spike in the volume of traffic collected by the Australian intelligence service on the NSA's behalf. After nine days of data-bombing, the Iranian's contact book and contact books for several people within it were "emergency detasked". In a briefing from the NSA's Large Access Exploitation working group, that example was used to illustrate the need to narrow the criteria for interception of data. It called for a "shifting collection philosophy": "Memorialise what you need" vs. "Order one of everything off the menu and eat what you want."

with Barton Gellman, Ashkan Soltani and Julie Tate, Washington Post Ashkan Soltani is an independent security researcher and consultant. Follow IT Pro on Twitter