In both cases, we need to create a license plate detection model to be used in our image/video processing pipeline. Let’s create one!

Project setup

I have prepared detectron2-licenseplates project with all the necessary code and data to go through this story.

If you know nothing about Detectron2 and how to use it in your computer vision pipeline, look at my previous story:

First clone the project repository:

$ git clone git://github.com/jagin/detectron2-licenseplates.git

$ git checkout edd03e4b31ec52487a506f2ed711ce9faf0b94f6

$ cd detectron2-licenseplates

The commit edd03e4b31ec52487a506f2ed711ce9faf0b94f6 indicates the source code compatible with the content of this story.

For project environment setup, I’m using Conda which is also included in Anaconda — data science and machine learning platform. If you are curious about the platform and why to use it read: Get your computer ready for machine learning: How, what and why you should use Anaconda, Miniconda and Conda

Let’s create the project environment:

$ conda env create -f environment.yml

$ conda activate detectron2-licenseplates

The created environment includes all the requirements we need to train and test our model on Detectron2 platform.

Dataset

To train our model, we will use images from MediaLab LPR dataset. This dataset doesn’t contain annotations, but I created them for you in PASCAL VOC format using CVAT tool (there are also other interesting tools for data labelling like labelimg and labelme).

Here is the structure of our license plates dataset:

datasets

└── licenseplates

├── annotations

│ ├── 04ow1.xml

│ ├── ...

│ ├── zb35o.xml

│ └── zhr5k.xml

├── images

│ ├── 04ow1.jpg

│ ├── ...

│ ├── zb35o.jpg

│ └── zhr5k.jpg

├── test.txt

└── train.txt

annotations folder contains Pascal VOC annotations XML files, one file per image. It stores metadata about an image like a folder where the image is stored, its filename, size and each bounding box. There is only one class: licenseplate.

dataset/licenseplates/annotations/0unth.xml

Next, we have images folder with the following content:

dataset/licenseplates/images

train.txt and test.txt is our dataset split to train and test the model.

This dataset cannot be used to build a production-ready model. It is too small. After some cleaning, there are 137 images with one license plate in each. But that’s all we need to play around.

Register the Dataset

For Detectron2 to know how to obtain the dataset, we need to register it and optionally, register metadata for your dataset.

The process is well described with details in Detectron2 documentation.

In general, Detectron2 uses its own format for data representation which is similar to COCO’s JSON annotations. It is a matter of implementing a function that returns the items in your custom dataset and register it:

def get_dicts():

...

return dicts # in the Detectron2 format from detectron2.data import DatasetCatalog

DatasetCatalog.register("my_dataset", get_dicts)

For dataset which is already in the COCO format, Detectron2 provides the register_coco_instances function which will register load_coco_json for you and add metadata about your dataset.

Metadata is a key-value mapping that provides information about dataset like names of classes, colors of classes, root of files, etc. which are accessible through MetadataCatalog.get(dataset_name).some_metadata .

In our case, we have the dataset in Pascal VOC format and there is no general-purpose loader for that format. Fortunately, Detectron2 has an implementation for registering Pascal VOC datasets (see detectron2/detectron2/data/datasets/pascal_voc.py and register_all_pascal_voc function in detectron2/detectron2/data/datasets/builtin.py ) which could be an inspiration for us.

In our project, there is register_licenseplates_voc function in licenseplates/dataset.py file which will load our data and register it together with metadata.

def register_licenseplates_voc(name, dirname, split):

DatasetCatalog.register(name,

lambda: load_voc_instances(dirname,

split))

MetadataCatalog.get(name).set(thing_classes=CLASS_NAMES,

dirname=dirname,

split=split)

To see if it works there is a quick test in if __name__ == ‘__main__’: block of the code to display the image with annotation using our loader and Detectron2 Visualizer .

if __name__ == ‘__main__’: block of the code triggers if it’s run as the main module only so we would be able to import our module later safely.

$ python licenseplates /dataset.py

will display random 10 images with annotation from the train dataset. You can switch to the test dataset with option --split test .

Image with annotation

We are ready to train our model.

Model training and evaluation

Our approach will be using transfer learning where the weights of existing network architecture are tuned to predict classes that the original network was not trained on.

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. — Andrej Karpathy, Transfer Learning

We can choose the model config and weights from Detectron2 Model Zoo and create DefaultTrainer — a trainer with default training logic which is:

Create model, optimizer, scheduler, dataloader from the given config. Load a checkpoint or cfg.MODEL.WEIGHTS , if exists. Register a few common hooks.

as Detectron2 documentation states.

The Trainer class is as simple as:

class Trainer(DefaultTrainer):

@classmethod

def build_evaluator(cls, cfg, dataset_name):

return VOCDetectionEvaluator(dataset_name)

We could use DefaultTrainer directly but in our case, we want to add some custom detection evaluation. As a metric in measuring the accuracy of the object detector, we use Average Precision (AP, AP50, AP75). The evaluation procedure of the detection task for PASCAL VOC is described here. Be also sure to reference Jonathan Hui’s excellent article.

The training process goes in four steps:

Register the license plates dataset Setup model configuration Run the training process Evaluate the model

We already know how to register dataset but let’s focus a little bit on the model configuration which is stored in configs folder:

configs

├── Base-RCNN-FPN.yaml

├── Base-RetinaNet.yaml

├── lp_faster_rcnn_R_50_FPN_3x.yaml

└── lp_retinanet_R_50_FPN_3x.yaml

Detectron2 provides a lot of different models which can be accessed with detectron2.model_zoo package, but we need to modify them for our case (we have only one class to detect) and have version control on the config in our repository.

I included two COCO object detection baselines from Detectron2 Model Zoo:

Fast R-CNN — region-based object detector

RetinaNet — single-shot object detector

and adjust it to our needs.

The model config is setup through setup_cfg function from licenseplates/config.py script.

Let’s train the Fast R-CNN model:

$ python train.py --config-file configs/lp_faster_rcnn_R_50_FPN_3x.yaml

It takes a few minutes to train this toy dataset (300 iterations) on RTX 2080 Ti with the results alike:

[01/22 14:08:38 d2.utils.events]: eta: 0:00:12 iter: 239 total_loss: 0.139 loss_cls: 0.026 loss_box_reg: 0.115 loss_rpn_cls: 0.000 loss_rpn_loc: 0.004 time: 0.2075 data_time: 0.0048 lr: 0.004795 max_mem: 2357M

[01/22 14:08:42 d2.utils.events]: eta: 0:00:08 iter: 259 total_loss: 0.128 loss_cls: 0.023 loss_box_reg: 0.097 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2074 data_time: 0.0046 lr: 0.005195 max_mem: 2357M

[01/22 14:08:46 d2.utils.events]: eta: 0:00:04 iter: 279 total_loss: 0.125 loss_cls: 0.024 loss_box_reg: 0.096 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2072 data_time: 0.0045 lr: 0.005594 max_mem: 2357M

[01/22 14:08:51 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth

...

[01/22 14:08:54 d2.engine.defaults]: Evaluation results for licenseplates_test in csv format:

[01/22 14:08:54 d2.evaluation.testing]: copypaste: Task: bbox

[01/22 14:08:54 d2.evaluation.testing]: copypaste: AP,AP50,AP75

[01/22 14:08:54 d2.evaluation.testing]: copypaste: 81.8429,100.0000,100.0000

[01/22 14:08:54 d2.utils.events]: eta: 0:00:00 iter: 299 total_loss: 0.132 loss_cls: 0.025 loss_box_reg: 0.105 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2083 data_time: 0.0044 lr: 0.005994 max_mem: 2357M

[01/22 14:08:54 d2.engine.hooks]: Overall training speed: 297 iterations in 0:01:02 (0.2090 s / it)

[01/22 14:08:54 d2.engine.hooks]: Total training time: 0:01:05 (0:00:03 on hooks)

To train the RetinaNet model on our dataset you can run the same script with different model configuration (it will overwrite the results from the previously trained model):

$ python train.py --config-file configs/lp_retinanet_R_50_FPN_3x.yaml

You can observe all the metrics on TensorBoard running:

$ tensorboard --logdir output

Training curves in TensorBoard

Prediction

The trained model is saved to output/model_final.pth file and we can use it in our prediction on images from the test dataset:

$ python predict.py --config-file configs/lp_faster_rcnn_R_50_FPN_3x.yaml MODEL.WEIGHTS output/model_final.pth

The script will randomly display 10 samples (see --samples option) from the test dataset.

Prediction results.

Did you spot the false positive? You can get rid of it increasing the confidence threshold with option --confidence-threshold 0.75 .

Summary

Detectron2 is the object detection and segmentation platform released by Facebook AI Research (FAIR) as an open-source project. Beyond state-of-the-art object detection algorithms includes numerous models like instance segmentation, panoptic segmentation, pose estimation, DensePose, TridentNet. It is easy to reuse them in your research or create your custom model thanks to its modular design.

I hope that this story will help you train your own model. Happy codding!

Resources