Training a Plant Disease Classifier

The Dataset

The data used in this article is obtained from the PlantVillage Disease Classification Challenge organized by CrowdAi. The goal of this challenge was to develop algorithms that can accurately diagnose a disease based on a plant image. PlantVillage is a not-for-profit project by Penn State University in the US and EPFL (Ecole polytechnique fédérale de Lausanne) in Switzerland.

The organizations empower smallholder farmers to increase yield by leveraging Artificial Intelligence to provide offline expert level knowledge and extension advice. PlantVillage has already collected and continue to collect tens of thousands of images of diseased and healthy crops.

The same dataset of diseased plant leaf images and corresponding labels comprising 38 classes of crop disease can also be found in spMohanty’s GitHub account.

Editor’s Note: You can also check out our community spotlight on how Plant Village uses on-device machine learning to detect plant disease in remote parts of East Africa

Training the Model

We use the vision module of the Fastai library to train an image classification model which can recognize plant diseases at state-of-the-art accuracy. While training of the model can be done locally using a laptop, we use Google Colab which gives us more compute power, access to a GPU, and an easy-to-use Jupyter notebook environment for building machine learning and deep learning models.

We begin by placing the following three lines at the start of the notebook to ensure that any edits made to libraries are reloaded automatically, and also that any charts or images displayed are displayed within the notebook. These are not Python codes but special directives for Jupyter Notebook itself. The % is one of the magic commands supported in Jupyter Lab that adds extra functionality to our Jupyter Lab notebooks (and isn’t limited to the core language).

%reload_ext autoreload

%autoreload 2

%matplotlib inline

The next step is to import the required libraries. The fastai module and any other module can easily be installed using the pip command.

!pip install fastai #installs the fastai library import numpy as np

from fastai import *

from fastai.vision import *

from pathlib import Path

Loading and looking at the Data

We download the colored (original RGB) images using the following command:

Whenever we approach a problem, the first thing to do is to take a look at the data in order to better understand what the problem is and what the data looks like before we can figure out how to solve the problem. Taking a look at the data means understanding how the data directories are structured, what the labels are, and what some of the sample images look like.

In this particular dataset, the folder name represents the class label of all the images present within that folder. We need to extract the label names from the folder name automatically. Fortunately, fastai library provides the ImageDataBunch.from_folder function that enables automatic extraction of the label names from the folder name. In addition, the ImageDataBunch class makes it easy to create the training and validation sets with images and labels. Once the data is loaded, we can also normalize the data by using .normalize to ImageNet parameters.

The .show_batch() function of the ImageDataBunch class can be used to view a random sample of images from the given data.

Sample plant images

You’ll notice that the images appear to have been zoomed and cropped in a reasonably nice way. Fastai provides a rich image transformation library, whose main purpose is data augmentation when training computer vision models. The library can, however, be used for other general transformation tasks such as default center cropping.