Given the dramatically different views of the incoming Trump administration and the outgoing Obama one, one archivist worries "about pages being shut down." Fears rise of Trump-era 'memory hole' in federal data

Fears have spiked that Donald Trump’s incoming administration won’t settle for simply rolling back President Barack Obama’s policies, but may hobble federal agencies by removing reams of data from government websites — or even destroying it.

The worry has escalated to a near-panic among some environmental and transparency advocates following the disclosure last week that Trump transition officials at the Energy Department issued a questionnaire demanding the names of employees who took part in climate negotiations, as well as controversial efforts to estimate the social costs of carbon emissions. The Trump aides also sought details of how the agency calculates the costs of alternative energy and asked for a list of all websites where scientists at national laboratories post information "during work hours."


Tech-savvy liberals have responded by feverishly combing through government data, making copies and downloading what they can. And while concerns about a data purge may be overblown, many of the employees don't want to take any chances that valuable or unqiue data could be lost — particularly if it's targeted for elimination under the Trump administration.

"There are concerns that websites will vanish, data will be destroyed, records will vanish," said Gary Bass, a longtime transparency advocate with the Bauman Foundation. "Some of it seems potentially real. Some seems fantasy or hyperbole. I have a hard time figuring out which is which."

"I've been worried because Trump is obviously anti-transparency." said Russ Kick, founder of "The Memory Hole," an early website devoted to documents disappeared from the internet. "I wouldn't be surprised if a lot of websites become shells of what they are now."

The Trump transition team did not respond to a request for comment on the Trump-inspired rush to grab and preserve government data.

Since 2008, a series of large institutions have attempted to capture a snapshot of online federal government data every four years through a project known as the End of Term Archive. However, worries about the incoming administration have triggered a series of hastily-organized, ad hoc efforts to make sure that federal agencies' web offerings don't simply vanish on Inauguration Day or soon thereafter.

Fueled in part by distress over the outcome of last month's election, small bands of activists with computer skills have sponsored hackathons to download government data or flag it for collection by others. Even some outside the U.S. are getting in on the act. On Saturday, climate researchers at the University of Toronto are promoting what they've dubbed a daylong "guerilla archiving" to protect climate data from any tampering by Trump appointees.

Some individual researchers said in interviews that they're also taking it on themselves to hoard data. They believe that many agencies posted their data online under pressure from President Barack Obama and will retreat under Trump.

"The intelligence agency websites would love to become single-page websites," Kick said. "I wouldn't be surprised if they pull down their [Freedom of Information Act] reading rooms....I've been manually downloading a lot. I went through and grabbed pretty much every PDF file on the [Office of Director of National Intelligence] website."

In a cavernous ballroom at the Capital Hilton in Washington Tuesday, organizers of the End of Term Archive briefed university librarians and digital archiving experts about the sudden outpouring of interest in their work and how to integrate the more spontaneous efforts into the broad collection of government sites and pages.

"This year, we've seen a lot of these activities just sprout up," said Abbie Grotke of the Library of Congress. "We are losing control a little bit."

Jefferson Bailey of the Internet Archive, a key player in the archiving project, described some concerns about potential disappearance of data as "hysteria" and said most disappearance of data from the government web is due to "benign neglect," equipment changes or similar issues.

However, given the dramatically different views of the incoming Trump administration and the outgoing Obama one, Bailey sees more potential for what he called—in archivist-speak—"politically-driven ephemerality." He added, in blunter terms: "I worry about pages being shut down."

Some of the challenges in archiving government websites are more technical than political. As websites become more sophisticated, traditional web pages with static content are becoming less common, while content specifically generated for an individual user is now more frequent, although still far from prevalent on government sites.

"There's a lot more dynamic content on the web than there was four or eight years ago. Some of that is challenging to capture," Bailey said. "There are sometimes FTP servers or other directories that a crawler might not discover because they're hidden....Subdirectories are very hard to find."

Another problem: some websites allow users to query large government databases for specific information, but don't allow downloading of the data in bulk. That makes it more difficult to obtain an entire data set.

"Databases are a problem," Grotke told the group. "There's been a lot of interest in that this year and in data sets."

One worry that broke out on a transparency-focused list-serv last week: some government websites embed a special code telling automated data-grabbing crawlers to skip over certain pages. This causes many web services and search engines to ignore that data, although the government has no real authority to limit copying or indexing of those pages.

However, Mark Phillips of the University of North Texas says the crawlers being used for the current data sweep will pull in those pages regardless. "It's all crawled.....For the End of Term Archive, we're going through, capturing data as broadly as we can," he said.

Organizers of the archiving effort have gotten some direct assistance from the Obama White House, including a roster of subdomains on the White House website and list of all web pages posted on the site.

Some of Obama's policies have also been a boon to the archiving project, including a 2013 executive order to make government data more readily accessible and a drive to collect many federal government data sets on a single site: data.gov.

"This is a huge undertaking," said Hudson Hollister of the Data Coalition, an industry group that lobbies for government data to be posted online in standardized, accessible formats. "The easy stuff has been transformed....The hard stuff requires extra policymaking."

Hollister said it's unclear whether Trump will maintain Obama's order. "It was an executive order and we know Trump has said he's going to review all executive orders and cancel some out. Honestly, I don't know where he'll come down on this one," Hollister said.

On Saturday, the Senate unanimously passed a bill aimed at codifying and strengthening Obama's directive to make federal data more available. However, the legislation died for the year because the House had already gone home for the holidays. The bill is expected to be reintroduced early next year.

While agencies still have considerable discretion about what to post online and could have more under Trump, there are some legal limits. The Freedom of Information Act requires agencies to post frequently requested records on the web. And the Federal Records Act makes it illegal to dispose of official records without proper authorization.

Records retention policies at federal agencies vary widely and can often be surprisingly limited. During the recent controversy over Hillary Clinton's use of a private email server, it emerged that until 2013 the State Department had no policy requiring bulk archiving of official emails, even for those in top positions. Ambiguities about agencies' obligations to keep records are also stoking fears that Trump appointees might decide to get rid of what they have or to stop collecting it in the first place.

The top official at the National Archives and Records Administration, Archivist David Ferriero, sent a memo to all federal agencies last month reminding them of their record-keeping obligations and the need to make sure departing and incoming officials understand their duties.

However, spokeswoman Miriam Kleiman said her agency primarily focuses on whether agencies are preserving records, not whether they are making them available to the public.

"NARA's records management guidance mainly focuses on records creation, retention, and eventual deletion or transfer to NARA for permanent preservation. NARA has not issued specific guidance about large data sets being taken down from publicly-available websites," Kleiman said.

Transparency advocates say they're not willing to take a chance on the rules being followed by the incoming administration, nor are they interested in waiting months or years to use the Freedom of Information Act to obtain data that was at one point available to the public online. So, the massive effort to capture the sprawling federal government presence on the web is pressing forward with a sense of urgency.

"The date January 20th comes to mind as a deadline for trying to get this solved to some degree," Kick said. "We're really running against the clock here."