Preprocessing and cleaning: a major bottleneck in medical image analysis

“These limitations severely impact the practicality of real-time medical image analysis — critical in many medical scenarios — and dramatically slows down the entire medical treatment pipeline.”

Although deep learning has shown some significant achievements in image analysis and classification, their application to medical images has only recently started gaining momentum. This is because medical images are intrinsically noisier and prone to artifacts. Despite these challenges, these techniques have been shown to provide more accurate diagnoses than human doctors in certain scenarios.

A major hurdle that has to be overcome in the effective classification of medical images in diagnosis is preprocessing and cleaning. This task is a major bottleneck as it requires a significant amount of time to prepare images for training, is computationally very demanding, and requires expertise about the domain. These limitations severely impact the practicality of real-time medical image analysis — critical in many medical scenarios — and dramatically slows down the entire medical treatment pipeline.

My goal: Faster, automated segmentation of medical brain images

“I wanted to apply my domain expertise to solve an important problem in medical imaging and computer vision. I developed an automated algorithm that can segment raw brain images into grey matter, white matter and cerebrospinal fluid using Stacked Denoising Autoencoders.”

During my Ph.D. I gained experience in multi-dimensional spectral analysis, and as a postdoctoral fellow worked on developing algorithms for super-resolution imaging. My previous experience has taught me this much:

Traditional image analysis is slow. Any image analysis pipeline usually involves several steps — image normalization, filtering, thresholding, transformations, etc. Many of these steps are very computationally expensive, especially when they have to performed on a large dataset or implemented in real time. Traditional image analysis is subjective. For example, the level of thresholding or the extent of smoothing on an image can be dependent on the user, making these decisions difficult to generalize. Implementation of artificial intelligence to image analysis and classification has shown significant development over the past few years. This is largely owed to recent advancements in novel architectures and approaches in neural networks, and better availability of computational resources (GPUs).

An important task in medical image analysis that depends largely on image processing is image segmentation, a crucial primary step in clinical applications. In the analysis of medical MRIs, image segmentation provides assessment of the shapes and sizes of various anatomical regions of the tissue, and how they change during disease progression.

For my Insight Health Data project, I wanted to apply my domain expertise to solve an important problem in medical imaging and computer vision. I developed an automated algorithm that can segment raw brain images into grey matter, white matter and cerebrospinal fluid using Stacked Denoising Autoencoders.

Training dataset: Raw brain MRIs and their segmented counterparts

I obtained my data from the Open Access Series of Imaging Studies (OASIS), a project aimed at making MRI data sets of the brain freely available to the scientific community. The dataset consists of a cross-sectional collection of more than 400 subjects between 18 and 96 years of age, both male and female, and having varying degrees of brain size and shape (Figure 1). Some of the subjects had been clinically diagnosed with very mild to moderate Alzheimer’s disease. The high level of diversity in the dataset presented a significant level of complexity.

Figure 1. Distribution of the estimated total intracranial volume, normalized whole brain volume and age of the subject in the OASIS dataset.

Apart from the raw images, the dataset also consisted of the brain MRI images processed and segmented into gray matter, white matter and cerebrospinal fluid using the conventional image processing pipeline. My aim was to use the segmented images provided in the dataset as the target, and develop an automated algorithm that can obtain these images directly from the raw brain slices without any pre-processing and in a shorter amount of time. This would allow for real-time diagnosis (within seconds!) compared to the conventional pipeline (which takes tens of minutes).

Figure 2. Left image — middle slice from the brain MRI volume. Right images — segmented images obtained by conventional image processing pipeline.

For the development of the algorithm, all slices from the 3D volume of the brain except the top and bottom ten — as they have very little brain and mostly skull and other tissue — were used as the input, and the corresponding slices of the segmented image were used as the targets. 80% of the dataset was used for training and the remaining 20% for the evaluation of the developed algorithms. Additionally, the images were augmented by horizontal and vertical translation, and rotation.

Stacked Denoising Autoecoders for reconstructing brain segments

“In denoising autoencoders, the input is stochastically corrupted by adding noise or removing part of the image and then trained against the original image. The goal is to predict the missing part of the image or predict the correct image from a noisy input.”

My initial goal was to develop a single algorithm to segment the brain into white matter, gray matter and cerebrospinal fluid. However, I realized that all three segmented images had different characteristics. For instance, the white matter consisted of large and broad white patches, the cerebrospinal fluid was thinner and wiry, and the gray matter consisted of both of these characteristics. Hence, I decided to develop one algorithm for each. Another objective of my project was to develop an algorithm that is more lightweight (uses fewer parameters) than what is often used in the currently popular neural network architectures such as U-Net.

For these reasons I chose to use Stacked Denoising Autoencoders (SDAE). All the algorithms were developed and fine-tuned on a Amazon EC2 p2.xlarge instance.

An autoencoder is a neural network which is often used for dimensionality reduction, as well as feature extraction and selection. Typically, the number of hidden layers is fewer than the input, hence the autocoder essentially learns a latent space representation of the input but in a lower dimensional space. This latent space representation can be used to reconstruct the original image; however, this reconstruction may not be perfect. One can think of embedding the original input image from higher dimensional space to lower dimensional latent space as being similar to lossy image compression (like in jpeg compression). Denoising autoencoders benefit from this lossy nature of the encoder.

Figure 3. Basic concept implementing the Stacked Denoising Autoencoders. The ‘Noisy Input’ images is first encoded into the latent space, which has lower dimensionality. This is similar to compression that leads to loss of information (in our case we should lose information about the noise). The latent space representation is then decoded (the decompression part), which gives us the ‘Denoised’ reconstructed image.

In denoising autoencoders, the input is stochastically corrupted by adding noise or removing part of the image and then trained against the original image. The goal is to predict the missing part of the image or predict the correct image from a noisy input. I implemented this strategy for reconstructing the segmented images of the brain.

The raw brain MRI images were considered as the noisy/corrupted images, and the aim was to train the denoising autoencoder to predict the denoised/segmented brain image. Two layers of denoising autoencoders were stacked on top of each other. This was very helpful as it took less time to train each denoising encoder, and provided better efficiency at reconstructing the segmented images.

SDAE segments brain images with high fidelity, but much faster

“Compared to the conventional pipeline that take on the order of tens of minutes, the SDAE is able to generate all three segmented images in less than one second.”

The algorithms trained to reconstruct the white matter, gray matter and cerebrospinal fluid were bench-marked on the evaluation dataset. Several different metrics were used to compare the segmented images reconstructed by the SDAE to that generated by the conventional pipeline (considered as the gold standard in this project). Figure 4 below shows the comparison between the images reconstructed by SDAE and the traditional image processing pipeline for one of the images in the evaluation dataset.

Figure 4. Comparison of the segmented image generated by conventional pipeline (cyan) to those reconstructed by SDAE (magenta). The rightmost panel shows the overlay of the two with white indicating a perfect match match.

I used three different metrics to compare the images reconstructed by the SDAE to those generated by the conventional pipeline. They are area under the receiver operating characteristic curve (AUC), the Structuring similarity index and Jaccard index.

Figure 5. Left: ROC curve for one of the images in the evaluation dataset. Right: Average values of the three metrics used for comparison over the entire evaluation dataset.

As shown in Figure 5, the images reconstructed by SDAE are overall very similar to that generated using the conventional pipeline. The major differences in the reconstructed images are in the cerebrospinal fluid especially near the periphery of the brain. This is likely because the raw images contain the skull, and the algorithm is unable to differentiate between the skull and the cerebrospinal fluid. However, the match in the interior of the brain (which would be the region of interest in most cases) is still very robust.

The most significant advantage of the SDAE, however, is that all the three segmented images can be generated extremely fast. Compared to the conventional pipeline that take on the order of tens of minutes, the SDAE is able to generated all three segmented images in less than one second. In other words, this is ~400 times faster than the rate of a conventional pipeline. This speed enhancement would not only allow for real-time diagnosis, but also help to speed up the entire healthcare pipeline.

Parting thoughts

While this project was very exciting, and I was able to achieve very good results in terms of image accuracy and speed enhancement, there is still room for improvement. For instance, regarding the reconstruction of the cerebrospinal fluid image, it may be beneficial to start with brain MRI image that is stripped of the skull.

Another point to consider is that while the reconstruction of the image is very quick, there is still some difference between the images obtained from SDAE compared to that generated by the conventional algorithm. Are these differences acceptable and how much does this difference affect the ability of a healthcare professional in diagnosis?

In any case, the project showcases that Stacked Denoising Autoencoders can be a very powerful tool for processing of medical images, save significant time, and also reduce the bias that is typically involved in this process.