Imagine if computers could read and interpret documents. Humans could focus their efforts on understanding what analysis results mean to make better decisions. Our interest, at Dynamic Risk, is to improve the safety and reliability of energy pipeline networks by taking full advantage of the vast amounts of data locked in cumbersome formats, handwritten documents, drawings, photographs, and in paper archives.

We want to fundamentally change how we ask questions and receive answers. Today, we ask questions based on the data we have available in structured databases. In the future we want to ask questions and not worry about having the data readily available in a structured format.

To move forward towards this grand vision, we are sponsoring a challenge to solve the first part of this puzzle. We want a solution that can not only learn how and what data to extract from data sources like spreadsheets, word processor files, and computer generated PDF files but also learn how to map the data found in these documents to specified target fields in a database.



The Problem

The user will be presented a document to be "read". They will manually identify the locations in the document that contains the data required and teach the system how it maps to the target fields in the database including any manipulations to the data (ie: unit conversions) that are necessary. The software will need to learn, using a series of documents which will vary in the formatting of the same target content. For example, the learning process will need to process reports containing the same target information but enclosed in reports by different service providers, therefore the formats can be quite different.

Ideally, the solution must be able to empirically rate its level of confidence in its results as well as identify what it cannot process.



The Challenge Breakthrough

If computers could read and analyse vast amounts of information, we can focus our efforts on understanding what the results means and make better decisions. The Cognitive Computing Challenge will break the barriers that prevent us from accessing the majority of information in this world locked in written documents which require humans to interpret and extract the useful information. Our interest is to change how energy pipeline networks are managed, improve their safety, and ultimately save lives. However, this technology has broad application to multiple industries.

These types of tasks are currently done manually by humans. A solution to this challenge will automate the repetitive tasks required to do complex analysis which will eliminate most of the time required and minimize errors. Ideally, there should be as little human intervention as possible.