Google Cloud AutoML Vision for Medical Image Classification

Pneumonia Detection using Chest X-Ray Images

The normal chest X-ray (left panel) shows clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse “interstitial” pattern in both lungs. (Source: Kermany, D. S., Goldbaum M., et al. 2018. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. https://www.cell.com/cell/fulltext/S0092-8674(18)30154-5)

Google Cloud AutoML Vision simplifies the creation of custom vision models for image recognition use-cases. The concepts of neural architecture search and transfer learning are used under the hood to find the best network architecture and the optimal hyperparameter configuration that minimizes the loss function of the model. This article uses Google Cloud AutoML Vision to develop an end-to-end medical image classification model for Pneumonia Detection using Chest X-Ray Images.

Table of Contents

About the Dataset

The dataset contains:

5,232 chest X-ray images from children.

3,883 of those images are samples of bacterial (2,538) and viral (1,345) pneumonia.

1,349 samples are healthy lung X-ray images.

The dataset is hosted on Kaggle and can be accessed at Chest X-Ray Images (Pneumonia).

Part 1: Enable AutoML Cloud Vision on GCP

(1). Go to the cloud console: https://cloud.google.com/

Google Cloud Homepage

(2). Open Cloud AutoML Vision by clicking the triple-dash at the top-left corner of the GCP dashboard. Select Vision under the product section for Artificial Intelligence.

Open AutoML Vision

(3). Select Image Classification under AutoML Vision.

Image Classification under AutoML Vision

(4). Setup Project APIs, permissions and Cloud Storage bucket to store the image files for modeling and other assets.

Setup Project APIs and Permissions

(5). Select your GCP billing project from the drop-down when asked. Now we are ready to create a Dataset for building the custom classification model on AutoML. We will return here after downloading the raw dataset from Kaggle to Cloud Storage and preparing the data for modeling with AutoML.

In this case, the automatically created bucket is called: gs://ekabasandbox-vcm.

Part 2: Download the Dataset to Google Cloud Storage

(1). Activate the Cloud shell (in red circle) to launch the ephemeral VM instance to stage the dataset download from Kaggle, unzip it and upload to the storage bucket.

Activate Cloud Shell

(2). Install the Kaggle command-line interface. This tool will allow us to download datasets from Kaggle. Run the following code:

sudo pip install kaggle

Note, however, that the Cloud Shell instance is ephemeral and does not persist system-wide changes when the session ends. Also, if a dataset is particularly large, other options exist such as spinning-up a compute VM, downloading the dataset, unzip it and then upload to Cloud Storage. It is possible to design other advanced data pipelines to get data into GCP for analytics/ machine learning.

(3). Download Kaggle API token key that will enable the Kaggle CLI to authenticate/ authorize against Kaggle to download the desired datasets.

Login to your Kaggle account.

Go to: https://www.kaggle.com/[KAGGLE_USER_NAME]/account

Click on: Create New API Token.

Create API Token

Download the token to your local machine and upload it to the cloud shell.

Move the uploaded .json key to the directory .kaggle . Use the code below:

mv kaggle.json .kaggle/kaggle.json

(4). Download dataset from Kaggle to Google Cloud Storage.

kaggle datasets download paultimothymooney/chest-xray-pneumonia

(5). Unzip the downloaded dataset

unzip chest-xray-pneumonia .zip

unzip chest_xray.zip

(6). Move the dataset from the ephemeral cloud shell instance to the created cloud storage bucket. Insert your bucket name here.

gsutil -m cp -r chest_xray gs://ekabasandbox-vcm/chest_xray/

Part 3: Preparing the Dataset for Modeling

(1). Launch Jupyter Notebooks on the Google Cloud AI Platform.

Notebooks of GCP AI Platform

(2). Create a new Notebook Instance.

Start a new JupyterLab instance

(3). Select an instance name and create.

Choose an instance name and create

(4). Open JupyterLab

Open JupyterLab

(5). Before building a custom image recognition model with AutoML Cloud Vision, the dataset must be prepared in a particular format:

For training, the JPEG, PNG, WEBP, GIF, BMP, TIFF, and ICO image formats are supported with a maximum size of 30mb per image. For inference, the image formats JPEG, PNG and GIF are supported with each image being of maximum size 1.5mb. It is best to place each image category into containing sub-folder within an image folder. For example,

(image-directory) > (image-class-1-sub-dir) — (image-class-n-sub-dir) Next, create a CSV that points to the paths of the images and their corresponding label. AutoML uses the CSV file to point to the location of the training images and their labels. The CSV file is placed in the same GCS bucket containing the image files. Use the bucket automatically created when AutoML Vision was configured. In our case, this bucket is named gs://ekabasandbox-vcm.

(6). Clone the preprocessing script from Github. Click on the icon, circled in red and labeled (1) and enter the Github URL https://github.com/dvdbisong/automl-medical-image-classification to clone the repo with the preprocessing code.

Clone preprocessing script

(7). Run all the cells in the notebook preprocessing.ipynb to create the CSV file containing the path and labels of the images and upload this file to Cloud Storage. Be sure to change the parameter for the bucket_name .

Run notebook cells

Part 4: Modeling with Cloud AutoML Vision

(1). Click on “New Dataset” from the AutoML Vision Dashboard.

Create New Dataset

(2). Fill-in the dataset name and select the CSV file from the Cloud Storage bucket created by AutoML.

Create Dataset

(3). For now, you may dismiss if you see the error message that duplicated files are located. To the best of my knowledge, this is not the case as per the file names.

Cloud AutoML Processed Images

(4). Click on Train as shown in red in the image above to initiate model building with Cloud AutoML.

Start AutoML Training

(5). Select how the model will be hosted, and the training budget.

Select training parameters

(6). After model training is complete, click on Evaluate to view the performance metric of the model.

Evaluate model performance

(7). Assess the performance metric (precision, recall and confusion matrix).