Author: Jonathan Chung, Applied scientist intern at Amazon.

Despite living in the 21st century, a large portion of everyday documents are still handwritten. Many school notes, doctor notes, and historical documents are handwritten.

Figure 1: Examples of handwritten texts

Archiving the handwritten documents is essential but it is usually limited to storing high resolution images of the documents. As the textual information in the documents is difficult to recognise, people resort to storing manually transcribed text of the handwriting [1].

There are currently considerable efforts to develop automated methods to transcribe handwritten text. Usually, the data flow consists of performing page segmentation to identify regions of texts within a document then running handwritten text recognition [1].

The main focus of this blog post is to describe page segmentation methods. In future posts, handwritten text recognition using recurrent neural networks (RNN) will be described (similar to [2, 3]).

Page segmentation

Figure 2: Example of a document from the IAM dataset with a bounding box around the handwritten text

Page segmentation has been widely studied in historical documents where documents are segmented into decoration, background, text block, and periphery. Traditionally, handcrafted features are used for segmentation but recently, Chen et al. [1] demonstrated that convolutional neural networks (CNNs) in an encoder-decoder architecture can automatically learn high-level feature representations of historic documents. The feature representations are fed into an SVM classifier to learn the segmentations of the document.

Here we focus on the segmentation of handwritten texts from the IAM dataset [4]. Documents from the IAM dataset contain a printed portion and handwritten portion. The goal of this algorithm is to fit a bounding box around the handwritten portion of the document (example shown in Figure 2).

Methods

Two methods of obtaining the bounding box were explored: handcrafted features using the Maximally Stable Extremal Regions (MSERs) algorithm and using a deep CNN approach.

MSERs algorithm approach

The MSERs algorithm was used to detect “blobs” on the image which correspond to text on the images. The detected regions are post-processed to identify continuous regions of text with the following algorithm:

For i in iteration:

Expand the bounding boxes in all directions by a fraction denoted by Δ Merge all the bounding boxes that overlap a percentage denoted by I

Here are some of the results.