The FBI is facing a big data problem that is only going to get bigger.

In 2014, the investigation of the Boston Marathon attack produced about 50 terabytes of data. The Las Vegas shooting in 2017 produced 1 petabyte of data, meaning a 20 times increase in three years.

Soon, a typical investigation will provide 50 terabytes of data or more.


Gordon Bitko, the chief information officer of the FBI, said for each of those two examples, the bureau depended on more people, more resources to comb through the data to find key information for the investigation.

“We can’t keep telling the story that way when a big event happens we will surge resources to solve the problems,” Bitko said during a panel discussion at the recent AFCEA Northern Virginia Intelligence Community IT day. “The big focus for us really is how do we solve that problem? How do we think about data in better, more modern ways? How do we think about triaging and using automation to help us identify and get through all that information in ways that can really help us do analysis, do assessments better and faster and smarter, and help the agents and analysts focus on the unique parts of it where we need them to apply their expertise.”

The second leg of this effort is to increase the “data literacy” of the FBI’s workforce. Bitko said the FBI knows data scientists and others with data skills are in short supply so that’s why training the bureau’s current workforce and recruiting new employees with foundational skills is so important.

“There is not an investigative area and that includes everything from gangs to organized crime to counter intelligence where the amount of data being collected isn’t growing exponentially,” Bitko said during the panel, which also was a part of Ask the CIO. “Social media, online transactions, dark web transactions and Bitcoin transactions are all touching on every investigation. We are focused heavily to develop programs to drive the overall digital literacy of the workforce, to empower people across all 56 field offices to think about things, understand them and know what they can use technology to help them.”

The FBI hired a chief data officer, Maria Voreh, in March 2017 to help get ahold of the big data challenge. Bitko said the bureau has been developing an overall data strategy to reduce the current “data burden” on the workforce because of a combination of a lack of knowing what information or databases exists and the inability to access the systems even if you do know where the data stores live.

“The strategy is really designed around the transparency of what all the data assets are, the transparency of who should be able to use them and under what circumstances, and then allow the tools and automation help people with discovery and analysis of that information,” he said. “We are building tools, buying tools, we are trying to beg, borrow and steal. There always new capabilities out there in the world. We have a number of programs in place where we are trying to an assessment of those and understand what fits into our portfolio best, smartest and quickest. We are not trying to build or reinvent when there are already solutions. At the same time in the data world, there are lots of great open source tools and standards out there as well and a lot of the advanced data scientists in our organization and more broadly in the community want to use those tools. There are a number of programs on going to try to make those available.”

At the same time, the FBI is conducting some pilots to use aptitude testing across the workforce to get a better sense of employees who could bring some of those data analysis skills to bear.

Bitko said the FBI also may look to take advantage of non-traditional training around data analysis skills.

“One thing we’ve just started talking about doing, and this will be a challenge in government, can we do things like mid-career rotations, where we send people outside of government to gain some experience and then bring them back,” he said. “We are exploring that now because there are some challenges to doing that, but it’s certainly something we are interested in.”

Underpinning this effort to improve the bureau’s management and use of data is the cloud. Bitko said he wants to put more tools and capabilities into the cloud so agents and analysts can conduct investigations more quickly and seamlessly.

One example of the FBI’s move to the cloud comes from the Justice Department’s new IT strategy, which says the bureau’s plans to put its Next Generation Identification (NGI) biometrics system in the cloud, to ensure agents and analysts have on-demand access to the data.

Bitko said the FBI also is taking advantage of cloud shared services both from the intelligence community and from Justice partners. He said the FBI is using the C2S cloud in the IC for application development.