In this post, I’ll be covering how to use a pre-trained semantic segmentation DeepLabv3 model for the task of road crack detection in PyTorch by using transfer learning. The same procedure can be applied to fine-tune the network for your custom data-set. The code is available at following repository: https://github.com/msminhas93/DeepLabv3FineTuning

Let us start with a brief introduction to image segmentation. The primary goal of a segmentation task is to output pixel-level output masks in which regions belonging to certain categories are assigned the same distinct pixel value. If you color-code these segmentation masks by assigning a different color for every category for visualizing them, then you’ll get something like an image from a colouring book for kids. An example is shown below.

Source: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Segmentation has existed for a very long time in the domain of Computer Vision and Image processing. Some of the techniques are simple thresholding, clustering based methods such as k means clustering-segmentation, region growing methods, etc.

With recent advancements in deep learning and the success of convolutional neural networks in image-related tasks over the traditional methods, these techniques have also been applied to the task of image segmentation.

One of these models is the DeepLabv3 model by Google. Explaining how the model works is beyond the scope of this post. Instead, we shall focus on how to use a pre-trained DeepLabv3 network for our data-sets. Transfer learning involves the use of a network pre-trained for a Source domain and Task and adopting it for your intended/target domain and task. It can be represented by the following diagram.

We change the target segmentation sub-network as per own requirements and then either train a part of the network of the entire network. The learning rate chosen is lower than in the case of normal training. This is because the network already has good weights for the source task. We don’t want to change the weights too much too fast. Also sometimes the initial layers can be kept frozen since it is argued that these layers extract general features which can be potentially used without any changes.

Let’s dive into the coding part of the tutorial.

Data Pipeline

Let us begin by constructing a data-pipeline for our model. For the task of segmentation instead of a label in the form of a number of one hot encoded vector, we have a ground truth mask image. As an example, for a batch size of 4 and an image size of the image and mask sizes would be as follows.

We will be defining our segmentation data-set class for creating the PyTorch dataloaders. The class definition is given below.

class SegDataset(Dataset): """Segmentation Dataset""" def __init__(self, root_dir, imageFolder, maskFolder, transform=None, seed=None, fraction=None, subset=None, imagecolormode='rgb', maskcolormode='grayscale'): """ Args: root_dir (string): Directory with all the images and should have the following structure. root --Images -----Img 1 -----Img N --Mask -----Mask 1 -----Mask N imageFolder (string) = 'Images' : Name of the folder which contains the Images. maskFolder (string) = 'Masks : Name of the folder which contains the Masks. transform (callable, optional): Optional transform to be applied on a sample. seed: Specify a seed for the train and test split fraction: A float value from 0 to 1 which specifies the validation split fraction subset: 'Train' or 'Test' to select the appropriate set. imagecolormode: 'rgb' or 'grayscale' maskcolormode: 'rgb' or 'grayscale' """ self.color_dict = {'rgb': 1, 'grayscale': 0} assert(imagecolormode in ['rgb', 'grayscale']) assert(maskcolormode in ['rgb', 'grayscale']) self.imagecolorflag = self.color_dict[imagecolormode] self.maskcolorflag = self.color_dict[maskcolormode] self.root_dir = root_dir self.transform = transform if not fraction: self.image_names = sorted( glob.glob(os.path.join(self.root_dir, imageFolder, '*'))) self.mask_names = sorted( glob.glob(os.path.join(self.root_dir, maskFolder, '*'))) else: assert(subset in ['Train', 'Test']) self.fraction = fraction self.image_list = np.array( sorted(glob.glob(os.path.join(self.root_dir, imageFolder, '*')))) self.mask_list = np.array( sorted(glob.glob(os.path.join(self.root_dir, maskFolder, '*')))) if seed: np.random.seed(seed) indices = np.arange(len(self.image_list)) np.random.shuffle(indices) self.image_list = self.image_list[indices] self.mask_list = self.mask_list[indices] if subset == 'Train': self.image_names = self.image_list[:int( np.ceil(len(self.image_list)*(1-self.fraction)))] self.mask_names = self.mask_list[:int( np.ceil(len(self.mask_list)*(1-self.fraction)))] else: self.image_names = self.image_list[int( np.ceil(len(self.image_list)*(1-self.fraction))):] self.mask_names = self.mask_list[int( np.ceil(len(self.mask_list)*(1-self.fraction))):] def __len__(self): return len(self.image_names) def __getitem__(self, idx): img_name = self.image_names[idx] if self.imagecolorflag: image = cv2.imread( img_name, self.imagecolorflag).transpose(2, 0, 1) else: image = cv2.imread(img_name, self.imagecolorflag) msk_name = self.mask_names[idx] if self.maskcolorflag: mask = cv2.imread(msk_name, self.maskcolorflag).transpose(2, 0, 1) else: mask = cv2.imread(msk_name, self.maskcolorflag) sample = {'image': image, 'mask': mask} if self.transform: sample = self.transform(sample) return sample

The class has three methods. The first is the initialization method and takes in the root_dir which is the directory where the data-set is stored. This is followed by the imageFolder and maskFolder arguments which are used to specify the names of the image and mask folders in the data-set directory. If you have just a single directory of images and masks then you can use the fraction and subset argument to split the images into train and validation sets. The fraction argument is for the validation set size. So a value of 0.2 means a train val split of 0.8 and 0.2. The arguments imagecolormode and maskcolormode specify the color mode of images and masks respectively. It can either be rgb or greyscale. The second method returns the total number of samples in the loader. The third method is the main core of our class. It gives a sample given index value. We get a dictionary having the image and mask arrays.

Next, we need to define a few basic transforms which shall be useful for the training a network for the segmentation task. We can’t use the transforms available in torchvision directly since our data-set class returns a dictionary which will not work with the standard transforms.

class Resize(object): """Resize image and/or masks.""" def __init__(self, imageresize, maskresize): self.imageresize = imageresize self.maskresize = maskresize def __call__(self, sample): image, mask = sample['image'], sample['mask'] if len(image.shape) == 3: image = image.transpose(1, 2, 0) if len(mask.shape) == 3: mask = mask.transpose(1, 2, 0) mask = cv2.resize(mask, self.maskresize, cv2.INTER_AREA) image = cv2.resize(image, self.imageresize, cv2.INTER_AREA) if len(image.shape) == 3: image = image.transpose(2, 0, 1) if len(mask.shape) == 3: mask = mask.transpose(2, 0, 1) return {'image': image, 'mask': mask} class ToTensor(object): """Convert ndarrays in sample to Tensors.""" def __call__(self, sample, maskresize=None, imageresize=None): image, mask = sample['image'], sample['mask'] if len(mask.shape) == 2: mask = mask.reshape((1,)+mask.shape) if len(image.shape) == 2: image = image.reshape((1,)+image.shape) return {'image': torch.from_numpy(image), 'mask': torch.from_numpy(mask)} class Normalize(object): '''Normalize image''' def __call__(self, sample): image, mask = sample['image'], sample['mask'] return {'image': image.type(torch.FloatTensor)/255, 'mask': mask.type(torch.FloatTensor)/255}

The three transforms are briefly explained below.

Resize: This is used to resize the image and masks to any size. ToTensor: This converts the images into PyTorch tensors which can be used for training the networks. Normalize: This just divides the image pixels by 255 to make them fall in the range of 0 to 1.

I’ve written two helper functions that give you dataloaders depending on your data directory structure.

1. get_dataloader_sep_folder(data_dir, imageFolder='Images', maskFolder='Masks', batch_size=4)

Create Train and Test dataloaders from two separate Train and Test folders. The directory structure should be as follows.

data_dir

--Train

------Image

---------Image1

---------ImageN

------Mask

---------Mask1

---------MaskN

--Train

------Image

---------Image1

---------ImageN

------Mask

---------Mask1

---------MaskN

2. get_dataloader_single_folder(data_dir, imageFolder='Images', maskFolder='Masks', fraction=0.2, batch_size=4)

Create from a single folder. The structure should be as follows.

--data_dir ------Image ---------Image1 ---------ImageN ------Mask ---------Mask1 ---------MaskN

These give you training and validation dataloaders which shall be used in the training process.

For this tutorial, I’ll be using the CrackForest data-set for the task of road crack detection using segmentation. It consists of 118 images of urban roads with cracks. Pixel level annotations for the cracks in the form of binary masks are available.

Next, we discuss how to load the pre-trained model and change the segmentation head according to our data-set requirements.

DeepLabv3 Model

Torchvision has pre-trained models available and we shall be using one of those models. I’ve written the following function which gives you a model which has a custom number of output channels.

""" DeepLabv3 Model download and change the head for your prediction""" from models.segmentation.deeplabv3 import DeepLabHead from torchvision import models def createDeepLabv3(outputchannels=1): model = models.segmentation.deeplabv3_resnet101( pretrained=True, progress=True) # Added a Sigmoid activation after the last convolution layer model.classifier = DeepLabHead(2048, outputchannels) # Set the model in training mode model.train() return model

First we get the pre-trained model from the torchvision function. Then we do the major step of changing the segmentation head. This is done by replacing the classifier module of the model with a new DeepLabHead with new number of output channels. Finally the model is set to train mode.

So far we’ve covered how to create the dataloaders, defined our custom transforms and created the DeepLabv3 model with the modified head.

The next step is to train the model. I’ve defined the following train_model function, that trains the model. It save the training and validation loss and metric (if specified) values into a csv log file for easy access.

import csv import copy import time from tqdm import tqdm import torch import numpy as np import os def train_model(model, criterion, dataloaders, optimizer, metrics, bpath, num_epochs=3): since = time.time() best_model_wts = copy.deepcopy(model.state_dict()) best_loss = 1e10 # Use gpu if available device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model.to(device) # Initialize the log file for training and testing loss and metrics fieldnames = ['epoch', 'Train_loss', 'Test_loss'] + \ [f'Train_{m}' for m in metrics.keys()] + \ [f'Test_{m}' for m in metrics.keys()] with open(os.path.join(bpath, 'log.csv'), 'w', newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for epoch in range(1, num_epochs+1): print('Epoch {}/{}'.format(epoch, num_epochs)) print('-' * 10) # Each epoch has a training and validation phase # Initialize batch summary batchsummary = {a: [0] for a in fieldnames} for phase in ['Train', 'Test']: if phase == 'Train': model.train() # Set model to training mode else: model.eval() # Set model to evaluate mode # Iterate over data. for sample in tqdm(iter(dataloaders[phase])): inputs = sample['image'].to(device) masks = sample['mask'].to(device) # zero the parameter gradients optimizer.zero_grad() # track history if only in train with torch.set_grad_enabled(phase == 'Train'): outputs = model(inputs) loss = criterion(outputs['out'], masks) y_pred = outputs['out'].data.cpu().numpy().ravel() y_true = masks.data.cpu().numpy().ravel() for name, metric in metrics.items(): if name == 'f1_score': # Use a classification threshold of 0.1 batchsummary[f'{phase}_{name}'].append( metric(y_true > 0, y_pred > 0.1)) else: batchsummary[f'{phase}_{name}'].append( metric(y_true.astype('uint8'), y_pred)) # backward + optimize only if in training phase if phase == 'Train': loss.backward() optimizer.step() batchsummary['epoch'] = epoch epoch_loss = loss batchsummary[f'{phase}_loss'] = epoch_loss.item() print('{} Loss: {:.4f}'.format( phase, loss)) for field in fieldnames[3:]: batchsummary[field] = np.mean(batchsummary[field]) print(batchsummary) with open(os.path.join(bpath, 'log.csv'), 'a', newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writerow(batchsummary) # deep copy the model if phase == 'Test' and loss < best_loss: best_loss = loss best_model_wts = copy.deepcopy(model.state_dict()) time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60)) print('Lowest Loss: {:4f}'.format(best_loss)) # load best model weights model.load_state_dict(best_model_wts) return model

The best model is decided by the lowest loss value. As can be seen in the code at line 39, we get the sample dictionary from the dataloader and then split those into the inputs and masks respectively. The loss for every batch is calculated at line 48. I’ve used the mean squared error (MSE) loss function. It is defined as follows:

The loss is calculated at the pixel level in our case. I’ve used the Adadelta optimizer with a learning rate of 0.0001 or . The fine tuning was done for 25 epochs. The trained model achieves a testing AUROC value of 0.84. A sample segmentation output of the model is shown below.

F1 score and AUROC metric values were used for the evaluation purposes.

The F1 score values are for a threshold value of 0.1. These values will change depending on the choice of threshold. AUROC, on the other hand, takes into account all the possible threshold values and is a more robust measure for our segmentation task.

Even though the model is performing well on the data-set, as can be seen from the Segmentation Output image, the masks are over dilated in comparison to the ground truth. This can be since the model is very huge can be potentially overfitting causing this problem. With this, we reach the end of the tutorial.

We learnt how to do transfer learning for the task of semantic segmentation using DeepLabv3 in PyTorch. We learnt how to create the dataset class for segmentation followed by custom transforms required for training the model. We then learnt how to change the segmentation head of the torchvision model as per our dataset. The transfer learning was tested on the CrackForest. It achieves an AUROC score of 0.84 after 25 epochs. The code is available at: https://github.com/msminhas93/DeepLabv3FineTuning

Thank you for reading the article. Please do like, share and subscribe if you found the post useful.