The Department of Homeland Security is poised to ditch all records from a controversial network monitoring system called Einstein that are at least three years old, but not for security reasons.

DHS reasons the files -- which include data about traffic to government websites, agency network intrusions and general vulnerabilities -- have no research significance.

But some security experts say, to the contrary, DHS would be deleting a treasure chest of historical threat data. And privacy experts, who wish the metadata wasn’t collected at all, say destroying it could eliminate evidence that the governmentwide surveillance system does not perform as intended.

The National Archives and Records Administration has tentatively approved the disposal plan, pending a public comment period.

According to Homeland Security’s rationale, there is "quickly diminishing value for most of the data collected pursuant to intrusion detection, prevention and analysis." A three-year retention period for reference purposes is sufficient, and "the records have no value beyond that point" but can be kept longer, if needed, appraisers said.

Incident reports, which include records on catastrophic cyber events, must be kept permanently.

The main driver for defining data retention policies, typically, is the cost of storing information indefinitely.

The nonprofit SANS Internet Storm Center, which monitors malicious activity on the public Web, retains observation data for 12 years.

Older intrusion-detection records provide insight into the evolution of threats, said Johannes Ullrich, dean of research at the SANS Technology Institute. Analysts there sometimes need even older data to answer today's research questions.

"When we first started, our data was dominated by bots" -- networks of compromised computers -- "attacking common Windows services," he said. Then, a wider array of services started to come under attack, "and more recently, we do have data about the attack of devices -- Internet of Things -- as well as most recently attacks against big data systems."

Ideally, Homeland Security’s intrusion records would be made available to the public in some form, Ullrich said.

"The Einstein data would likely be a goldmine for researchers, as it documents attacks against very specific networks in a consistent way over a large extent of time," he said.

The records might show, for instance, attackers trying to guess host names, such as “admin.healthcare.gov,” that would give them total control over the Obamacare website, Ullrich said.

Storage costs in a commercial cloud likely would be reasonable, he added, ballparking the figure at $50 a month per terabyte of data.

Is This Another One of Those Coverups?

Some civil liberties advocates back Homeland Security’s move to expunge records that might contain individuals' metadata and communications as soon as possible.

"Einstein is a network monitoring system and a lot of the data likely concerns user activity," said Ginger McCall, director of the Open Government Program at the Electronic Privacy Information Center. "We would typically not want agencies to retain that data."



Yet, scrubbing the data presents an accountability challenge for department auditors.

“As a general matter, getting rid of data about people's activities is a pro-privacy, pro-security step,” said Lee Tien, senior staff attorney with the Electronic Frontier Foundation. But “if the data relates to something they're trying to hide, that's bad.”

It is possible the records could reveal the monitoring tools make mistakes when attempting to spot threats.

“Some of them are very smart and in fact, some of them try to learn and try to make guesses about things,” Tien said. By throwing out three-year-old records, “would you be getting rid of the very data that would allow [the Government Accountability Office] to say, 'Yes, it works fine,' or, 'No, it didn't work, but got better?'”

The root problem is a lack of transparency surrounding Einstein, he said, likening the situation to criticism of the National Security Agency’s secrecy around its signals intelligence sweeps.

“You're setting up this data collection system that tracks people when they are using government websites," Tien said. And you don't necessarily have to have that repository. When the government is capturing that information and holds it in its records, there is always a privacy issue. We want to be able to have evaluated it.”

Rep. Elijah Cummings, D-Md., the ranking Democrat on the House Oversight and Government Reform Committee, intends to review the types of records set to be discarded. It is important for DHS to keep any Einstein records related to a breach, but if the records truly hold no worth, Democratic members do not see a problem with disposing of them, a minority committee staffer said.

DHS officials on Friday declined to comment beyond what was stated in the written rationale.

The public has until Dec. 19 to request a copy of the records retention plan. Comments are due within 30 days of receipt.

Categories of Records Headed to the Trash Folder

Core Infrastructure -- Email, contact and other personal information of federal workers and public citizens who communicate concerns about potential cyber threats to DHS; "Suspicious files, spam and other potential cyber threats via an email network" exclusively used within DHS' Mission Operating Environment system.

Intrusion Detection -- Network traffic data and alerts from government servers; this information includes the IP address, port address, timestamp and some red flags identified in network traffic; telltale signs, or signatures, of known malicious behavior; oddities in captured traffic, such as "an unusual number of hits," or sometimes, "known actors floating through multiple dot-gov" websites. Interactions with domain name system servers that translate website names like “USDA.gov” into numeric IP addresses.

Intrusion Prevention -- Indicators of known and unknown malicious activity agencies should be on the lookout for.

Analysis -- Forensic imagery and files from the U.S. Computer Emergency Readiness Team containing malicious data for studying purposes; metadata from traffic "packet capture" analysis might contain email addresses and IP addresses; a database for supporting commercially available tools that allow US-CERT personnel to visualize relevant relationships "by presenting drilldown views of data with patterns, trends, series and associations to analyze seemingly unrelated data”; a segregated, closed computer network system for inspecting digital devices and their storage mediums; information about security vulnerabilities and threats in the form of actual malicious code submitted to US-CERT.

Information Sharing -- Technical Web records, including operations and maintenance; content might include research, white papers, advertising for conferences and other published information for feds and the public; “CyberScope" reports on an agency's security posture required to comply with the 2002 Federal Information Security Management Act; the US-CERT.gov website and data exchange portal; a repository for threat sightings and indicators.