Ideally, you’ll want at least 100 images of each class. The good thing is that you can have multiple objects in each image, so you could theoretically get away with 100 total images if each image contains every class of object you want to detect. Also, if you have video footage, Detecto makes it easy to split that footage into images that you can then use for your dataset:

The code above takes every 4th frame in “video.mp4” and saves it as a JPEG file in the “frames” folder.

Once you’ve produced your training dataset, you should have a folder that looks something like the following:

images/

| image0.jpg

| image1.jpg

| image2.jpg

| ...

If you want, you can also have a second folder containing a set of validation images.

Now comes the time-consuming part: labeling. Detecto supports the PASCAL VOC format, in which you have XML files containing label and position data for each object in your images. To create these XML files, you can use the open-source LabelImg tool as follows:

pip3 install labelImg # Download LabelImg using pip

labelImg # Launch the application

You should now see a window pop up. On the left, click the “Open Dir” button and select the folder of images that you want to label. If things worked correctly, you should see something like this:

To draw a bounding box, click the icon in the left menu bar (or use the keyboard shortcut “w”). You can then drag a box around your objects and write/select a label:

When you’ve finished labeling an image, use CTRL+S or CMD+S to save your XML file (for simplicity and speed, you can just use the default file location and name that they auto-fill). To label the next image, click “Next Image” (or use the keyboard shortcut “d”).

Once you’re done with the entire dataset, your folder should look something like this:

images/

| image0.jpg

| image0.xml

| image1.jpg

| image1.xml

| ...

We’re almost ready to start training our object detection model!

Getting access to a GPU

First, check whether your computer has a CUDA-enabled GPU. Since deep learning uses a lot of processing power, training on a typical CPU can be very slow. Thankfully, most modern deep learning frameworks like PyTorch and Tensorflow can run on GPUs, making things much faster. Make sure you have PyTorch downloaded (you should already have it if you installed Detecto), and then run the following 2 lines of code:

If it prints True, great! You can skip to the next section. If it prints False, don’t fret. Follow the below steps to create a Google Colaboratory notebook, an online coding environment that comes with a free, usable GPU. For this tutorial, you’ll just be working from within a Google Drive folder rather than on your computer.

Log in to Google Drive Create a folder called “Detecto Tutorial” and navigate into this folder Upload your training images (and/or validation images) to this folder Right-click, go to “More”, and click “Google Colaboratory”:

Create your Google Colab notebook

You should now see an interface like this:

Google Colab notebook environment. Learn more about the environment here.

5. Give your notebook a name if you want, and then go to Edit ->Notebook settings -> Hardware accelerator and select GPU

6. Type the following code to “mount” your Drive, change directory to the current folder, and install Detecto:

To make sure everything worked, you can create a new code cell and type !ls to check that you’re in the right directory.

Train a custom model

Finally, we can now train a model on our custom dataset! As promised, this is the easy part. All it takes is 4 lines of code:

Let’s again break down what we’ve done with each line of code:

Imported Detecto’s modules Created a Dataset from the “images” folder (containing our JPEG and XML files) Initialized a model to detect our custom objects (alien, bat, and witch) Trained our model on the dataset

This can take anywhere from 10 minutes to 1+ hours to run depending on the size of your dataset, so make sure your program doesn’t exit immediately after finishing the above statements (i.e. you’re using a Jupyter/Colab notebook that preserves state while active).

Using the trained model

Now that you have a trained model, let’s test it on some images. To read images from a file path, you can use the read_image function from the detecto.utils module (you could also use an image from the Dataset you created above):

As you can see, the model’s predict method returns a tuple of 3 elements: labels, boxes, and scores. In the above example, the model predicted an alien ( labels[0] ) at the coordinates [569, 204, 1003, 658] ( boxes[0] ) with a confidence level of 0.995 ( scores[0] ).

From these predictions, we can plot the results using the detecto.visualize module. For example:

Running the above code with the image and predictions you received should produce something that looks like this:

If you have a video, you can run object detection on it:

This takes in a video file called “input.mp4” and produces an “output.avi” file with the given model’s predictions. If you open this file with VLC or some other video player, you should see some promising results!

A short clip from the output video Detecto produces

Lastly, you can save and load models from files, allowing you to save your progress and come back to it later:

Advanced usage

You’ll be happy to know that Detecto isn’t just limited to 5 lines of code. Let’s say for example that the model didn’t do as well as you hoped. We can try to increase its performance by augmenting our dataset with torchvision transforms and defining a custom DataLoader:

This code applies random horizontal flips and saturation effects on images in our dataset, increasing the diversity of our data. We then define a DataLoader object with batch_size=2 ; we’ll pass this to model.fit instead of the Dataset to tell our model to train on batches of 2 images rather than the default of 1.

If you created a separate validation dataset earlier, now is the time to load it in during training. By providing a validation dataset, the fit method returns a list of the losses at each epoch, and if verbose=True , then it will also print these out during the training process itself. The following code block demonstrates this as well as customizes several other training parameters:

The resulting plot of the losses should be more or less decreasing:

For even more flexibility and control over your model, you can bypass Detecto altogether; the model.get_internal_model method returns the underlying torchvision model used, which you can mess around with as much as you see fit.

Conclusion

In this tutorial, we showed that computer vision and object detection don’t need to be challenging. All you need is a bit of time and patience to come up with a labeled dataset.

If you’re interested in further exploration, check out Detecto on GitHub or visit the documentation for more tutorials and use cases!