Stay on Top of Emerging Technology Trends Get updates impacting your industry from our GigaOm Research Community

When it comes to large organizations working on artificial intelligence systems for understanding language, there’s Google, Microsoft, Yahoo and … the Defense Advanced Research Projects Agency. The agency, better known as DARPA, is working on a project it calls Deep Exploration and Filtering of Text, or DEFT, in order to analyze textual data at a scale beyond what humans could do by themselves.

The project launched in mid-2012 and appears to have a duration of about 4.5 years. Here is the aim of DEFT, according to its website:

Automated, deep natural-language processing (NLP) technology may hold a solution for more efficiently processing text information and enabling understanding connections in text that might not be readily apparent to humans. DARPA created the Deep Exploration and Filtering of Text (DEFT) program to harness the power of NLP. Sophisticated artificial intelligence of this nature has the potential to enable defense analysts to efficiently investigate orders of magnitude more documents, which would enable discovery of implicitly expressed, actionable information within those documents.

Essentially, DARPA hopes the technology could help intelligence analysts scan more text documents and audio files to determine what’s being talked about in them — who, what, where, timing, etc. — and decipher ambiguous statements or indirect references to people and things. As the system analyzes new sources of data, analysts would receive alerts that describe the suspect communication (content, people, places, etc.) as well as a set of related documents.

This isn’t DARPA’s first foray into NLP, but the agency hopes new deep learning approaches will help it better understand documents at a semantic level, thus enabling the new capabilities. Beyond just analyzing content as it comes in and sending alerts, the agency also hopes to create a database of sorts connecting the issues and entities contained in all that text:

“DEFT also aims to enable the capability to integrate individual facts into large domain models as information is processed to support assessment, planning, prediction and the initial stages of report writing. If successful, DEFT will allow analysts to move from limited, linear processing of huge sets of data to a nuanced, strategic exploration of available information.”

Defense agencies could presumably analyze all sorts of information sources using the DEFT technologies, ranging from YouTube videos and web forum posts to communications intercepted via spy programs. Although none of the DEFT literature references this capability, systems DARPA appears to be looking at software and research from numerous DARPA-funded universities in order to advance DEFT’s goals, including Stanford (which is well known for its NLP research), Columbia and Carnegie Mellon.

Via its $25 million XDATA program, first announced as part of the president’s big data push in 2012, the agency is also investigating a number of other machine learning approaches to organizing and analyzing the huge amounts of data that defense agencies (not just the National Security Agency) are gathering, including graph analysis, time-series correlations and advanced visualization techniques.