Enterprise Content Management (ECM) and eDiscovery aren’t necessarily all that new, but it’s becoming increasingly important to implement due to the continuing transition into a digital age. Businesses, especially large corporations, have a massive amount of electronically stored information (ESI) that they would be wise to retain for future possible legal ramifications, so the stakes are high.

Thankfully, developing comprehensive solutions and workflows for eDiscovery is no longer a pipe dream, especially with LEADTOOLS. eDiscovery requires a massive combination of document imaging technologies like scanning, OCR, PDF, and annotations. LEADTOOLS covers those popular bases but also includes many other technologies that set it apart as an optimal SDK choice for an industry that thrives on accuracy and thoroughness.

The following white paper gives a brief overview of eDiscovery and highlights all of the major LEADTOOLS features for creating document imaging solutions that stand above the competition.

Introduction

When it comes to change, the desire for efficiency is surely at or near the top of the list of reasons. Some processes and industries are harder to change, especially those that have been around for a long time. Court systems in many countries are one of the oldest and most well established processes to ensure all-around fairness, even if it must sacrifice expediency. Thankfully, the legal industry has taken major strides towards adapting to the digital age with the evolution of eDiscovery and document imaging.

One major reason Electronic Discovery has become such a major part of the legal industry is the fact that so much data is natively generated and stored digitally. This electronically stored information (ESI) comes from a wide variety of communications (e.g. email, text messaging) and file formats. In addition to being a practical necessity, the benefits of eDiscovery are quickly expanding it into the preferred method of discovery. Rather than juggling two methods of discovery for different information sources, many legal counsels convert their traditional paper documents into ESI so all case information can follow the same process.

EDRM – Electronic Discovery Reference Model

Speaking of process, the Electronic Discovery Reference Model (EDRM) is a descriptive paradigm for how eDiscovery generally works in each case. Each stage in the EDRM is fluid and can be repeated or refined as the case evolves. The overarching goal and outcome is to take a huge amount of ESI and cull it down to what will actually be used in court.

Document Imaging technology, and in particular the technology offered in LEADTOOLS, has a role in nearly every stage of the EDRM. Some might be obvious, such as scanning paper documents into digital formats and using optical character recognition (OCR) to make the documents searchable. However, in a highly-competitive industry where time and funds can quickly get exhausted and turned into a lost case, having every available tool at one’s disposal is vital to success in the courtroom. Before diving into the specific imaging technologies, the following overview of the EDRM will help enlighten how and where each technology fits in.

Figure 1: EDRM diagram (adapted from edrm.net)

Information Governance/Management

Information Governance, or Information Management ensures that proper information is saved and that it’s stored and organized well.

Identification

This second phase in the EDRM is the first active step in the legal process, which throws a wide net to gather every bit of information that has any potential relevance to the case.

Preservation and Collection

After documents are identified, they must be preserved. Preservation applies a legal hold on ESI that requires the documents to not undergo any changes throughout the remainder of the case. Collection is the physical gathering of all ESI by a client into a transferable medium for sharing with their legal counsel.

Processing, Review, and Analysis

Removing duplicates and normalizing documents into decided-upon format(s) are the major tasks accomplished during Processing. Review takes a top-level look at the relevance of the ESI and Analysis is then a deep-dive look into the ESI to what is precisely relevant.

Production

Before they can appear in court, opposing sides must share their ESI with one another and decide which ESI is usable in the case.

Presentation

As implied by the name, the ESI is finally Presented in court.

LEADTOOLS at Work in eDiscovery Applications

LEADTOOLS Document Imaging SDKs have a wide gamut of imaging technologies perfectly suited in any eDiscovery application. The simplest of single-service specialists, end-to-end commercial ECMs, and everything in between will find everything needed to add world-class imaging technology to their applications.

Much the same way that the EDRM works as a general guide and process with steps that can be skipped or revisited, the imaging technology outlined below is not a mandatory set of features. For the most part, the order of these technologies follow the typical flow of use within an enterprise-level ECM but can be modified and reorganized to match the goals and creativity of any development team.

Easily one of the most crucial elements of an ECM and eDiscovery application is the ability to digitize paper documents. Nothing is more efficient at getting high-quality digital replications of paper documents than scanning. Even if this is all that a company does to prepare for court, scanning provides massive savings in time and finances due to the simplified transportation and sharing of electronic documents. One USB stick can replace hundreds, if not thousands, of pounds of papers shipped and carried from one office to another and then to the courtroom.

LEADTOOLS includes high-level classes that make it very easy to acquire images from any scanner with a TWAIN driver or SANE backend. Consider the following snippet, which prompts the user to select a TWAIN source, then loads the acquired image into the viewer

private void GetImageFromTwainSource() { _twainSession.SelectSource(string.Empty); _twainSession.AcquirePage += new EventHandler<TwainAcquirePageEventArgs>( twainSession_AcquirePage); _twainSession.Acquire(TwainUserInterfaceFlags.Show); } private void twainSession_AcquirePage(object sender, TwainAcquirePageEventArgs e) { imageViewer.Image = e.Image; }

Another hugely important feature when considering an imaging SDK is its ability to cleanup scanned images. There are two primary benefits to cleaning images, and each has huge trickle-down impacts on the entire eDiscovery process

First, and probably the most obvious, is that the document itself is more readable. This is great for the human eye, but even better for the computer. Only a few pixels separate a lower-cased l, upper-cased L, and the number 1. The human eye can still read text with a strikethrough or a line caused by a crease in the paper, but even the best OCR engines will return gibberish.

Second, is storage space. Many compression algorithms accomplish their work by comparing neighboring pixels. This is especially true for black and white images that make up the majority of scanned documents. Performing image cleanup functions that remove dust speckles, hole punches, lines, borders, and the like have a profound impact on the length of runs and size of blocks comprised of a single color, allowing for very high compression ratios upwards of 92% of the dirty image’s compressed size.