Big Data

Data-crunching spies of the future

The U.S. intelligence community is attempting to transform the way it uses and handles digital information. Unlike much of their work, agency officials want you to know all about it.

The Office of the Director of National Intelligence has been sending officials to conferences and gatherings in recent months to promote a years-long effort to modernize the IT infrastructure used by intelligence agencies. One of the primary goals of that effort is to better open up, analyze and share the massive amounts of sensitive U.S. government data between component agencies.

Intelligence agencies already do make use of data, from signals and imagery captured through orbital satellites to the phone call and internet traffic vacuumed up by the National Security Agency's controversial Post 9/11 digital surveillance apparatus. The intelligence community already has "world class computational capability" said ODNI CIO John Sherman in an interview. "Now we need to make sure that data is fully unlocked."

The concept, described by Sherman and other ODNI officials, is about shifting away from a business process in which digital data was one component of intelligence gathering alongside more traditional, analog tools, to a posture where capturing, cataloguing, analyzing and seamlessly sharing that data across common systems and platforms is one of the primary means of fulfilling the mission.

Like many federal agencies, ODNI and its components have disclosed plans to accomplish this through a mix of automation, machine learning-fueled analysis and shared services. A strategy published in 2017 emphasizes the need to treat data as a precious asset that must be free to move between different organizations at both speed and scale. The explosion of data in recent years has created opportunity both for U.S. intelligence agencies as well as foreign governments and potential adversaries to "exponentially expand the potential to influence people and events, both domestically and globally," making enhanced capabilities a requirement for keeping pace.

"To do this, we will 'free the data' by removing its current dependencies in element applications, systems and databases, thus allowing it to be catalogued, self-described and discoverable by automated means," the strategy states. "A majority of IC data is also tightly coupled with the data management and mission-analytic capabilities of IC element-specific systems. Ultimately, achieving data centricity will require separation of the data from these applications."

Meanwhile, the latest National Intelligence Strategy released in April as well as individual agency procurements put out this year put data analysis and digital transformation front and center, with agencies like the National Geospatial Intelligence Agency saying they must alter or end older operations that were designed in a world of data scarcity and move to ones that are positioned to quickly process and share information in a world of "data abundance."

To that end, ODNI brought in Nancy Morgan as its new Assistant Director of National Intelligence for Information and Data this year, specifically charging her with the job of marrying the intelligence community's data resources with its larger policy goals. One of those goals was convincing a bunch of spy agencies to open up their data and systems to other components. Paraphrasing the 2017 strategy, Morgan said the goal is about putting the right data in the hands of the right people quickly and securely.

"It used to be you built your data empire and hoarded it," Morgan said about the cultural mindset when she started at the position, which fulfills many of the functions carried out by the chief data officer at other agencies.

No more. Not surprisingly, Sherman said this mandate to share has led to "a lot of hard conversations" among different component agencies. Getting a room full of spooks to let go of their preferred systems in favor of a more common platform designed for give their secrets away to others "takes a tremendous amount of trust building."

While ODNI's vision includes grand plans down the road for unleashing AI and machine learning tools on a trove of interoperable data, the first step in that journey has been far more simple and mundane: inventorying, cataloguing and tagging the information that's already there.

For decades, intelligence agencies have been collecting and storing information related to every issue and mission, sometimes long-forgotten but still useful. While taking inventory, Sherman and Morgan's teams have found relics, like old missile tracking satellite imagery, that can be later repurposed into tracking the effects of climate change. Other times they will discover inefficiencies and duplication, such as paying twice to get the same dataset from a third-party provider.

ODNI has invested a lot of time making the data it does find consistent and interoperable. Each component has different authorities, classification methods and tagging protocols. Little things, like divergent uses of bureaucratic acronyms or spelling, can wreak havoc on system interoperability.

"The spelling of a word can really trip up certain systems," said Morgan.

Finding and harmonizing that data in conjunction with the intelligence community's broader cloud efforts "sets the table" for the use of artificial intelligence and other tools down the road, Sherman said. An early example of this strategy in action can be found in the CIA's $600 million private cloud developed by Amazon Web Services, as well its multibillion-dollar successor, the Commercial Cloud Enterprise. Both are designed to handle and process massive amounts of data and can be used by all 17-member organizations in the U.S. intelligence community.

Sherman and Morgan said they would find old shoeboxes stored in cabinets that were full of documents and information that has been lost and forgotten for years.

Once we completed it we found datasets that even the mission owners didn't know were there," said Sherman.