Run Object Detection using Deep Learning on Raspberry Pi 3 (1)

Why is it difficult to do object detection on Raspberry Pi 3?

This post is the first one of the series. The goal is to share our experience about how to leverage open-source resources to enable deep learning for objection detection on RPi3. As for the first one of the series, the post will talk about why running object detection on RPi3 is difficult.

First, deep learning (or to be more specific, CNN) on Raspberry Pi is nothing new. Pete Warden had released DeepBelief SDK for image recognition in 2014 [1], and SqueezeNet [2] was another alternative released in 2015 which aimed to bring lighter solution for embedded systems. However, image recognition only tells you what objects appear in the images but it doesn’t tell you the information about size and location of the objects. Imaging you use Raspberry Pi 3 to analyze the video stream of your front door, and it keeps telling you there is a person in the video stream. Without knowing the specific location and the size, you won’t be able to know whether this person brings immediate threat to your house or not. Thus, we want to enable Object Detection on RPi3 to tell us not only what the object is, but also where it is and how big it is.

Figure 1 (The image is from Pascal VOC dataset.) Right: Object detection model can specifically tell the location and the size of the objects. Left: Results from image recognition does not tell the size and the location of the objects.

Right: Object detection model can specifically tell the location and the size of the objects.

It is not easy. A detection model normally requires more parameters and therefore more operations than a recognition model. Take YOLOv2 [3] for example, the detection model uses 3x parameters and 6x more FLOPs than the base model, darknet19!

First, let’s take a look of the hardware spec of RPi3 [4]:

Table 1: Hardware spec of RPi3

And, the benchmark of some image recognition models on RPi3:

Table 2 Benchmark results of popular models on Raspberry Pi 3.

These numbers assume no optimization is applied to the framework and the model. For example, with the basic optimization DT42 applied to TensorFlow, the performance can be 3x better.

As one can see from the data in the table above, it is already hard to run recognition tasks on RPi3, and the best FPS is only 0.3; never to say running a detection model with 54x more parameters than a recognition model.

In order to achieve object detection using CNN on RPi3 using open-source resources, one needs to

1. Pick the right model with minimum operations

2. Find the right framework which has the best optimization for RPi3

The two topics will be covered in the next post and the post after the next one.

[1] https://petewarden.com/2014/06/09/deep-learning-on-the-raspberry-pi/

[2] https://github.com/DeepScale/SqueezeNet

[3] https://arxiv.org/abs/1612.08242

[4] https://en.wikipedia.org/wiki/Raspberry_Pi