Author: Ayush Chaurasia

COVID-19 or Coronavirus has taken the world by storm. At the time of writing this article, coronavirus has already been declared a pandemic by the WHO. Some of the world’s best research institutes are trying to develop vaccines to check the spread. Deep learning researchers are also hard at work to develop systems that can assist in the detection of infected patients.

In this tutorial, I’ll provide a boilerplate template for anyone who'd like to engage in research on COVID-19 datasets. We’ll walk through the process of readying the dataset, setting up early phase experiments and honing in on the best performing model through hyperparameter optimization.

Before you publish any results, please read this article on responsible data science in times of COVID. With that, let's get started!

The dataset was compiled by Adrian Rosebrock of pyimagesearch and consists of 25 chest X-rays of COVID-19 patients, as well as 25 chest X-rays of healthy patients. This is a “deep learning in radiology” problem with a toy dataset. We’ll use pytorch lightning, which is a high-level wrapper around the pytorch library. You can learn more about pytorch lightning and how to use it with Weights & Biases here.

Let’s load the dataset using pytorch lightning:

import pytorch_lightning as pl class Classifier(pl.LightningModule): def train_dataloader(self): transform_data = transforms.Compose([transforms.Resize((224,244)),transforms.ToTensor()]) data = torchvision.datasets.ImageFolder('./dataset', transform= transform_data) train_size = int(0.8*len(data)) test_size = int(len(data) - train_size) self.train_dataset, self.test_dataset = torch.utils.data.random_split(data, (train_size, test_size)) train_loader = torch.utils.data.DataLoader(self.train_dataset,batch_size=16) return train_loader def val_dataloader(self): test_loader = torch.utils.data.DataLoader(self.test_dataset, batch_size=16) return test_loader

Here, we have overridden the train_dataloader() and val_dataloader() defined in the pytorch lightning. Now these functions will be used by the Trainer load the training set and validation set. We have divided the dataset into 80-20 batch where 80% of the data will be used for training and 20% of the data will be used for validation.

Let’s have a look at the sample of the dataset.

The X-ray on the left is of a healthy person and the one on the right is of a COVID-19 patient. Our intention here is to try different models and optimize their hyperparameters to find the best model for our use case. Instead of manually trying out different hyperparameters, we can easily set up a Weights & Biases sweeps to automate the process.

First, we need to specify the parameters that we’re going to sweep along with their possible values. Let’s define that in the dictionary. We’ll also define the default values of these hyperparameters.

sweep_config = { 'method': 'random', #grid, random 'metric': { 'name': 'val_accuracy', 'goal': 'minimize' }, 'parameters': { 'learning_rate': { 'values': [0.1, 0.01,0.001] }, 'optimizer': { 'values': ['adam', 'sgd'] }, 'model':{ 'values':['VGG16','resnet18',] } } } config_defaults = { 'learning_rate': 0.001, 'optimizer': 'adam', 'model' : 'resnet18' } wandb.init(config=config_defaults)

Here, I’ve chosen VGG16 and Resnet-18 models for our datasets because we don’t have a large dataset.

Here we’re loading the predefined the models from torchvision. We need to change the last layer in both the networks to output only 2 neurons as this is a binary classification problem.

def __init__(self): super(Classifier, self).__init__() if wandb.config.model == 'resnet18': self.model = torchvision.models.resnet18() self.model.fc = torch.nn.Linear(512,2) if wandb.config.model == 'VGG16': self.model = torchvision.models.vgg16() self.model.classifier[6] = torch.nn.Linear(4096,2)

The forward function is straight-forward. We’re just calling the previously created model. The other function is for simply calculating the cross entropy loss.

def forward(self, x): x = self.model(x) return x def cross_entropy_loss(self, logits, labels): return F.cross_entropy(logits, labels)

I have omitted the calculation part from the above code as we’re going to focus on the logging. The entire code is available in this github repo. Here, we’ve logged the training loss as well as the validation loss directly to the Weights & Biases dashboard.

def training_step(self, train_batch, batch_idx): ''' Perform the training pass ''' logs = {'train_loss': loss} wandb.log(logs) return {'loss': loss, 'log': logs} def validation_step(self, val_batch, batch_idx): ''' Perform the validation operation ''' return {'val_loss': loss} def validation_end(self, outputs): ''' Average out the validation error ''' logs = {'val_loss': avg_loss} wandb.log(logs) return {'avg_val_loss': avg_loss, 'log': logs}

Finally, we’ll define the optimizer function to return an optimizer of our choice.

def configure_optimizers(self): optimizer = torch.optim.Adam(self.parameters(),lr=wandb.config.learning_rate) if wandb.config.optimizer == 'sgd': optimizer = torch.optim.SGD(self.parameters(),lr=wandb.config.learning_rate) #optimizer = torch.optim.SGD(self.parameters(),lr=0.01) return optimizer

Now we’re ready to sweep through all the possible combinations of models and their hyperparameters.

def train(): wandb.init(config=config_defaults) model = Classifier() model.prepare_data() model.train_dataloader() trainer = pl.Trainer(max_epochs = 10) trainer.fit(model)

Here we’ve made an instance of the classifier class and loaded the data to set up the trainer.

def train(): trainer.fit(model)

Now let’s run the sweep:

wandb.agent(sweep_id,function=train)

Here is my Sweeps page. When you run a sweep in the colab notebook, it will generate a sweep url for you.

Now we have the boilerplate code for COVID-19 research. Although the dataset that we’ve used here isn’t nearly enough for building systems ready for production, this code structure can nevertheless be used with any dataset released in the future by hackathon organizers or research institutes. So, go ahead and assemble your own dataset and try to build a COVID-19 detector using Weights & Biases. Let’s have a look at the training and validation accuracy for all the runs in the sweep –