Recent revelations of the National Security Agency's (NSA) data mining capabilities have come to the forefront recently, making "big data" a new subject of interest and concern for many people.

So what better time than now to launch a data analytics tool based on the very technology that the NSA uses to perform its real-time analysis of massive amounts of data being pulled in from sources like the PRISM program?

The timing of the launch may not have been ideal for startup Sqrrl Data, which announced version 1.1 of its flagship product Sqrrl Enterprise today. Sqrrl is a commercially extended version of Apache Accumulo, the big data analysis platform originally developed by the NSA for real-time data mining, with built-in protections designed to hide certain kinds of information from people without the clearance to view it. (Last week, Ars took an in-depth look at Accumulo and other tools that the NSA uses to tap into the firehose of data that it has access to.)

The news tie-in has certainly put a crimp into Sqrrl's ability to get customer testimonials. While the company has a handful of existing customers in government, finance, healthcare, and other industries, "Because of everything that's going on with NSA, our early customers are not particularly excited about talking," said Ely Kahn, Sqrrl's vice president of business development, in an interview with Ars

From tiny acorns

The NSA created Accumulo in 2008 as a tool to handle the massive amounts of data it collects through its surveillance operations. Sqrrl Enterprise takes Accumulo and the Hadoop Distributed File System (HDFS) as a starting point. "We're essentially creating a premium grade version of Accumulo with additional analysis, data ingest, and security features, and taking a lot of the lessons learned from working in large environments," said Kahn.

The company has plenty of direct experience with large environments: it's mostly made up of members of the team who developed Accumulo at the NSA. Of the seven members of the founding team, six came from the NSA; Kahn, the only outlier, is the former White House Director of Cybersecurity.

Last year, the team moved from Washington, DC to Cambridge, MA to be closer to its investors. Usually for these sorts of projects, Washington, DC-based venture capital company called In-Q-Tel is a primary investor due to its funding from the CIA and the US Intelligence Community. But despite the company's roots at the NSA, In-Q-Tel is not among Sqrrl Data's investors at the moment. "We talk to InQTel a lot," Kahn said. "They're good for companies trying to break into government, but not so much for companies trying to break out of government."

Sqrrl adds a few things atop Accumulo's inherent features to make them fit more easily into organizations without the inside IT muscle of the NSA. Those features include "iterators"—software components that constantly query data being pushed into Accumulo's distributed data store. Like the MapReduce technology used by Google, iterators can be launched en masse against data to ferret out acorns of information. Like the squirrels in Willy Wonka's factory, the iterators keep at it, constantly sorting the nuts.

Sqrrl includes a set of pre-built custom iterators "for real-time analytics," said Kahn, "and for what people refer to as a 'multi-analytic environment.'" They allow for things like real-time full text searches, graph analysis for identifying "entities" within data and the relationships between them, SQL-like queries, and statistical analysis.

One of the natural ways to use this sort of product is with what's referred to as Security Information Event Management (SIEM)—the real-time monitoring of log data, alarms, and other information generated by sensors of all types to look for the signature of a security threat. SIEM systems can be hooked to network monitoring systems such as those used by the NSA, but also to data streams from nearly anything that is networkable—point-of-sale systems, electronically keyed doors, and anything else that falls under the umbrella of the "Internet of Things."

Put it in a lockbox

The main characteristic that differentiates Accumulo (and thus Sqrrl Enterprise) from other "big data" platforms is its security. The platform allows for compartmentalization of segments of big data storage through an approach called cell-level security. The security level of each cell within an Accumulo table can be set independently, hiding it from users who don't have a need to know: whole sections of data tables can be hidden from view in such a way that users (and applications) without clearance would never know they weren't there. Sqrrl builds atop that security, providing ways for enterprises to plug in their existing user authentication and directory services systems to govern who can get access to data.

That level of security is also the basis of the NSA's contention that the data it collects is contained in a "lock-box"—at least in theory, the system could be configured to only allow certain elements of collected data to be viewed based on filter-based rules, preventing unauthorized fishing expeditions with wide-ranging queries. "We've learned recently that a number of other governments are using Accumulo—other governments in North America and Europe," Kahn said. "And that's largely because of the security controls."

In addition to the "half-dozen or so" paying customers Sqrrl has already landed, "there are a dozen additional unpaid proof of concepts out there with systems integrators and other big data providers who are testing out our software," Kahn said. With any luck, they could result in systems that help secure health data and financial data better while allowing big data analysis to help improve care and spot fraud.