Build a Hardware-based Face Recognition System for $150 with the Nvidia Jetson Nano and Python

Using Python 3.6, OpenCV, Dlib and the face_recognition module

With the Nvidia Jetson Nano, you can build stand-alone hardware systems that run GPU-accelerated deep learning models on a tiny budget. It’s just like a Raspberry Pi, but a lot faster.

With face recognition and python, you can easily track everyone who creeps up to your door.

To get you inspired, let’s build a real hardware project with a Jetson Nano. We’ll create a simple version of a doorbell camera that tracks everyone that walks up to the front door of your house. With face recognition, it will instantly know whether the person at your door has ever visited you before — even if they were dressed differently. And if they have visited, it can tell you exactly when and how often.

What is the Nvidia Jetson Nano and how is it different than a Raspberry Pi?

For years, Raspberry Pi has been the easiest way for a software developer to get a taste of building their own hardware devices. The Raspberry Pi is a $35 computer-on-a-board that runs Linux and fully supports Python. And if you plug in a $20 Raspberry Pi camera module, you can use it to build stand-alone computer vision systems. It was a game-changing product that sold over 12 million units in the first five years alone and exposed a new generation of software developers to the world of hardware development.

The Raspberry Pi 3B — an entire Linux computer on a single board.

While the Raspberry Pi is an amazing product, it’s painful to use for deep learning applications. The Raspberry Pi doesn’t have a GPU and its CPU isn’t especially fast at matrix math, so deep learning models usually run very slowly. It just isn’t what the Raspberry Pi was designed to do. Lots of computer vision developers tried to use it anyway but they usually ended up with applications that ran at less than one frame of video a second.

Nvidia noticed this gap in the market and built the Jetson Nano. The Jetson Nano is a Raspberry Pi-style hardware device that has an embedded GPU and is specifically designed to run deep learning models efficiently.

The Nvidia Jetson Nano is conceptually similar to a Raspberry Pi — it is a Linux computer on a single board. But it has a 128-core Nvidia GPU built-in for accelerating deep learning models and it supports CUDA acceleration.

The other really cool part is that the Jetson Nano supports the exact same CUDA libraries for acceleration that almost every Python-based deep learning framework already uses. This means that you can take an existing Python-based deep learning app and often get it running on the Jetson Nano with minimal modifications and still get decent performance. It’s a huge step up from the Raspberry Pi for deep learning projects.

What to Buy

With any hardware project, the first step is to buy all the parts that you’ll need to build the system. Here are the minimal pieces that you’ll need to buy:

1. Nvidia Jetson Nano board ($99 USD)

These are currently hard to get and regularly out of stock. Please watch out for scammers and try to buy from an official source to avoid getting scammed. You can often find them in stock direct from Nvidia.

Full disclosure: I got my Jetson Nano board for free from a contact at Nvidia (they were sold out everywhere else) but I have no financial or editorial relationship with Nvidia.

2. MicroUSB power plug (~$10 USD)

Look for a power adapter that specifically says it supports the Jetson Nano if possible as some USB plugs can’t put out enough power. But an old cell phone charger might work.

3. Raspberry Pi Camera Module v2.x (~$30 USD)

You can’t use a Raspberry Pi v1.x camera module! The chipset is not supported by the Jetson Nano. It has to be a v2.x camera module to work.

3. A fast microSD card with at least 32GB of space (~$10-$25 USD)

I got a 128GB card for a few dollars more on Amazon. I recommend going larger so don’t run out of space. If you already have an extra MicroSD card sitting around it, feel free to re-use it.

4. There are also a few other things that you will need but you might already have them sitting around:

A microSD card reader for your computer so that you can download and install the Jetson software

for your computer so that you can download and install the Jetson software A wired USB keyboard and a wired USB mouse to control the Jetson Nano

and a to control the Jetson Nano Any monitor or TV that accepts HDMI directly (not via an HDMI-to-DVI converter) so you can see what you are doing. You must use a monitor for the initial Jetson Nano setup even if you run without a monitor later.

or TV that accepts HDMI directly (not via an HDMI-to-DVI converter) so you can see what you are doing. You must use a monitor for the initial Jetson Nano setup even if you run without a monitor later. An ethernet cable and somewhere to plug it in. The Jetson Nano bizarrely does not have wifi built-in. You can optionally add a USB wifi adapter, but support is limited to certain models so check before buying one.

Get all that stuff together and you are ready to go! Hopefully, you can get everything for less than $150. The main costs are the Jetson Nano board itself and the camera module.

Of course, you might want to buy or build a case to house the Jetson Nano hardware and hold the camera in place. But that entirely depends on where you want to deploy your system.

Downloading the Jetson Nano Software

Before you start plugging things into the Jetson Nano, you need to download the software image for the Jetson Nano.

Nvidia’s default software image is great! It includes Ubuntu Linux 18.04 with Python 3.6 and OpenCV pre-installed which saves a lot of time.

Here’s how to get the Jetson Nano software onto your SD card:

Download the Jetson Nano Developer Kit SD Card Image from Nvidia. Download Etcher, the program that writes the Jetson software image to your SD card. Run Etcher and use it to write the Jetson Nano Developer Kit SD Card Image that you downloaded to your SD card. This takes about 20 minutes or so.

At this point, you have an SD card loaded with the default Jetson Nano software. Time to unbox the rest of the hardware!

Plugging Everything In

First, take your Jetson Nano out of the box:

All that is inside is a Jetson Nano board and a little paper tray that you can use to prop up the board. There’s no manual or cords or anything else inside.

The first step is inserting the microSD card. However, the SD card slot is incredibly well hidden. You can find it on the rear side under the bottom of the heatsink:

Next, you need to plug in your Raspberry Pi v2.x camera module. It connects with a ribbon cable. Find the ribbon cable slot on the Jetson, pop up the connector, insert the cable, and pop it back closed. Make sure the metal contacts on the ribbon cable are facing inwards toward the heatsink:

Now, plug in everything else:

Plug in a mouse and keyboard to the USB ports.

Plug in a monitor using an HDMI cable.

Plug in an ethernet cable to the network port and make sure the other end is plugged into your router.

Finally, plug in the MicroUSB power cord.

You’ll end up with something that looks like this:

The Jetson Nano will automatically boot up when you plug in the power cable. You should see a Linux setup screen appear on your monitor.

First Boot and User Account Configuration

The first time the Jetson Nano boots, you have to go through the standard Ubuntu Linux new user process. You select the type of keyboard you are using, create a user account and pick a password. When you are done, you’ll see a blank Ubuntu Linux desktop.

At this point, Python 3.6 and OpenCV are already installed. You can open up a terminal window and start running Python programs right now just like on any other computer. But there are a few more libraries that we need to install before we can run our doorbell camera app.

Installing Required Python Libraries

To build our face recognition system, we need to install several Python libraries. While the Jetson Nano has a lot of great stuff pre-installed, there are some odd omissions. For example, OpenCV is installed with Python bindings, but pip and numpy aren’t installed and those are required to do anything with OpenCV. Let’s fix that.

From the Jetson Nano desktop, open up a Terminal window and run the following commands. Any time it asks for your password, type in the same password that you entered when you created your user account:

sudo apt-get update sudo apt-get install python3-pip cmake libopenblas-dev liblapack-dev libjpeg-dev

First, we are updating apt, which is the standard Linux software installation tool that we’ll use to install everything else. Next, we are installing some basic libraries with apt that we will need later to compile numpy and dlib.

Before we go any further, we need to create a swapfile. The Jetson Nano only has 4GB of RAM which won’t be enough to compile dlib. To work around this, we’ll set up a swapfile which lets us use disk space as extra RAM. Luckily, there is an easy way to set up a swapfile on the Jetson Nano. Just run these two commands:

Note: This shortcut is thanks to the JetsonHacks website. They are great!

At this point, you need to reboot the system to make sure the swapfile is running. If you skip this, the next step will fail. You can reboot from the menu at the top right of the desktop.

When you are logged back in, open up a fresh Terminal window and we can continue. First, let’s install numpy, a Python library that is used for matrix math calculations:

pip3 install numpy

This command will take 15 minutes since it has to compile numpy from scratch. Just wait until it finishes and don’t get worried it seems to freeze for a while.

Now we are ready to install dlib, a deep learning library created by Davis King that does the heavy lifting for the face_recognition library.

However, there is currently a bug in Nvidia’s own CUDA libraries for the Jetson Nano that keeps it from working correctly. To work around the bug, we’ll have to download dlib, edit a line of code, and re-compile it. But don’t worry, it’s no big deal.

In Terminal, run these commands:

wget http://dlib.net/files/dlib-19.17.tar.bz2 tar jxvf dlib-19.17.tar.bz2 cd dlib-19.17

That will download and uncompress the source code for dlib. Before we compile it, we need to comment out a line. Run this command:

gedit dlib/cuda/cudnn_dlibapi.cpp

This will open up the file that we need to edit in a text editor. Search the file for the following line of code (which should be line 854):

forward_algo = forward_best_algo;

And comment it out by adding two slashes in front of it, so it looks like this:

//forward_algo = forward_best_algo;

Now save the file, close the editor, and go back to the Terminal window. Next, run these commands to compile and install dlib:

sudo python3 setup.py install

This will take around 30–60 minutes to finish and your Jetson Nano might get hot, but just let it run.

Finally, we need to install the face_recognition Python library. Do that with this command:

sudo pip3 install face_recognition

Now your Jetson Nano is ready to do face recognition with full CUDA GPU acceleration. On to the fun part!

Running the Face Recognition Doorbell Camera Demo App

The face_recognition library is a Python library I wrote that makes it super simple to do face recognition. It lets you detect faces, turn each detected face into a unique face encoding that represents the face, and then compare face encodings to see if they are likely the same person — all with just a couple of lines of code.

Using that library, I put together a doorbell camera application that can recognize people who walk up to your front door and track each time the person comes back. Here’s it looks like when you run it:

To get started, let’s download the code. I’ve posted the full code here with comments, but here’s an easier way to download it onto your Jetson Nano from the command line:

wget -O doorcam.py tiny.cc/doorcam

Then you can run the code and try it out:

python3 doorcam.py

You’ll see a video window pop up on your desktop. Whenever a new person steps in front of the camera, it will register their face and start tracking how long they have been near your door. If the same person leaves and comes back more than 5 minutes later, it will register a new visit and track them again. You can hit ‘q’ on your keyboard at any time to exit.

The app will automatically save information about everyone it sees to a file called known_faces.dat. When you run the program again, it will use that data to remember previous visitors. If you want to clear out the list of known faces, just quit the program and delete that file.

Doorbell Camera Python Code Walkthrough

Want to know how the code works? Let’s step through it.

The code starts off by importing the libraries we are going to be using. The most important ones are OpenCV (called cv2 in Python), which we’ll use to read images from the camera, and face_recognition, which we’ll use to detect and compare faces.

import face_recognition

import cv2

from datetime import datetime, timedelta

import numpy as np

import platform

import pickle

Next, we are going to create some variables to store data about the people who walk in front of our camera. These variables will act as a simple database of known visitors.

known_face_encodings = []

known_face_metadata = []

This application is just a demo, so we are storing our known faces in a normal Python list. In a real-world application that deals with more faces, you might want to use a real database instead, but I wanted to keep this demo simple.

Next, we have a function to save and load the known face data. Here’s the save function:

def save_known_faces():

with open("known_faces.dat", "wb") as face_data_file:

face_data = [known_face_encodings, known_face_metadata]

pickle.dump(face_data, face_data_file)

print("Known faces backed up to disk.")

This writes the known faces to disk using Python’s built-in pickle functionality. The data is loaded back the same way, but I didn’t show that here.

I wanted this program to run on a desktop computer or on a Jetson Nano without any changes, so I added a simple function to detect which platform it is currently running on:

def running_on_jetson_nano():

return platform.machine() == "aarch64"

This is needed because the way we access the camera is different on each platform. On a laptop, we can just pass in a camera number to OpenCV and it will pull images from the camera. But on the Jetson Nano, we have to use gstreamer to stream images from the camera which requires some custom code.

By being able to detect the current platform, we’ll be able to use the correct method of accessing the camera on each platform. That’s the only customization needed to make this program run on the Jetson Nano instead of a normal computer!

Whenever our program detects a new face, we’ll call a function to add it to our known face database:

def register_new_face(face_encoding, face_image):

known_face_encodings.append(face_encoding) known_face_metadata.append({

"first_seen": datetime.now(),

"first_seen_this_interaction": datetime.now(),

"last_seen": datetime.now(),

"seen_count": 1,

"seen_frames": 1,

"face_image": face_image,

})

First, we are storing the face encoding that represents the face in a list. Then, we are storing a matching dictionary of data about the face in a second list. We’ll use this to track the time we first saw the person, how long they’ve been hanging around the camera recently, how many times they have visited our house, and a small image of their face.

We also need a helper function to check if an unknown face is already in our face database or not:

def lookup_known_face(face_encoding):

metadata = None



if len(known_face_encodings) == 0:

return metadata



face_distances = face_recognition.face_distance(

known_face_encodings,

face_encoding

)



best_match_index = np.argmin(face_distances)



if face_distances[best_match_index] < 0.65:

metadata = known_face_metadata[best_match_index]

metadata["last_seen"] = datetime.now()

metadata["seen_frames"] += 1



if datetime.now() - metadata["first_seen_this_interaction"]

> timedelta(minutes=5):

metadata["first_seen_this_interaction"] = datetime.now()

metadata["seen_count"] += 1



return metadata

We are doing a few important things here:

Using the face_recogntion library, we check how similar the unknown face is to all previous visitors. The face_distance() function gives us a numerical measurement of similarity between the unknown face and all known faces— the smaller the number, the more similar the faces. If the face is very similar to one of our known visitors, we assume they are a repeat visitor. In that case, we update their “last seen” time and increment the number of times we have seen them in a frame of video. Finally, if this person has been seen in front of the camera in the last five minutes, we assume they are still here as part of the same visit. Otherwise, we assume that this is a new visit to our house, so we’ll reset the time stamp tracking their most recent visit.

The rest of the program is the main loop — an endless loop where we fetch a frame of video, look for faces in the image, and process each face we see. It is the main heart of the program. Let’s check it out:

def main_loop():

if running_on_jetson_nano():

video_capture =

cv2.VideoCapture(

get_jetson_gstreamer_source(),

cv2.CAP_GSTREAMER

)

else:

video_capture = cv2.VideoCapture(0)

The first step is to get access to the camera using whichever method is appropriate for our computer hardware. But whether we are running on a normal computer or a Jetson Nano, the video_capture object will let us grab frames of video from our computer’s camera.

So let’s start grabbing frames of video:

while True:

# Grab a single frame of video

ret, frame = video_capture.read()



# Resize frame of video to 1/4 size

small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)



# Convert the image from BGR color

rgb_small_frame = small_frame[:, :, ::-1]

Each time we grab a frame of video, we’ll also shrink it to 1/4 size. This will make the face recognition process run faster at the expense of only detecting larger faces in the image. But since we are building a doorbell camera that only recognizes people near the camera, that shouldn’t be a problem.

We also have to deal with the fact that OpenCV pulls images from the camera with each pixel stored as a Blue-Green-Red value instead of the standard order of Red-Green-Blue. Before we can run face recognition on the image, we need to convert the image format.

Now we can detect all the faces in the image and convert each face into a face encoding. That only takes two lines of code:

face_locations = face_recognition.face_locations(rgb_small_frame) face_encodings = face_recognition.face_encodings(

rgb_small_frame,

face_locations

)

Next, we’ll loop through each detected face and decide if it is someone we have seen in the past or a brand new visitor:

for face_location, face_encoding in zip(

face_locations,

face_encodings): metadata = lookup_known_face(face_encoding)



if metadata is not None:

time_at_door = datetime.now() -

metadata['first_seen_this_interaction']

face_label = f"At door {int(time_at_door.total_seconds())}s"



else:

face_label = "New visitor!"



# Grab the image of the the face

top, right, bottom, left = face_location

face_image = small_frame[top:bottom, left:right]

face_image = cv2.resize(face_image, (150, 150))



# Add the new face to our known face data

register_new_face(face_encoding, face_image)

If we have seen the person before, we’ll retrieve the metadata we’ve stored about their previous visits. If not, we’ll add them to our face database and grab the picture of their face from the video image to add to our database.

Now that we have found all the people and figured out their identities, we can loop over the detected faces again just to draw boxes around each face and add a label to each face:

for (top, right, bottom, left), face_label in

zip(face_locations, face_labels):

# Scale back up face location

# since the frame we detected in was 1/4 size

top *= 4

right *= 4

bottom *= 4

left *= 4



# Draw a box around the face

cv2.rectangle(

frame, (left, top), (right, bottom), (0, 0, 255), 2

)



# Draw a label with a description below the face

cv2.rectangle(

frame, (left, bottom - 35), (right, bottom),

(0, 0, 255), cv2.FILLED

)

cv2.putText(

frame, face_label,

(left + 6, bottom - 6),

cv2.FONT_HERSHEY_DUPLEX, 0.8,

(255, 255, 255), 1

)

I also wanted a running list of recent visitors drawn across the top of the screen with the number of times they have visited your house:

A graphical list of icons representing each person currently at your door.

To draw that, we need to loop over all known faces and see which ones have been in front of the camera recently. For each recent visitor, we’ll draw their face image on the screen and draw a visit count:

number_of_recent_visitors = 0 for metadata in known_face_metadata:

# If we have seen this person in the last minute

if datetime.now() - metadata["last_seen"]

< timedelta(seconds=10): # Draw the known face image

x_position = number_of_recent_visitors * 150 frame[30:180, x_position:x_position + 150] =

metadata["face_image"] number_of_recent_visitors += 1



# Label the image with how many times they have visited

visits = metadata['seen_count']

visit_label = f"{visits} visits" if visits == 1:

visit_label = "First visit" cv2.putText(

frame, visit_label,

(x_position + 10, 170),

cv2.FONT_HERSHEY_DUPLEX, 0.6,

(255, 255, 255), 1

)

Finally, we can display the current frame of video on the screen with all of our annotations drawn on top of it:

cv2.imshow('Video', frame)

And to make sure we don’t lose data if the program crashes, we’ll save our list of known faces to disk every 100 frames:

if len(face_locations) > 0 and number_of_frames_since_save > 100:

save_known_faces()

number_of_faces_since_save = 0

else:

number_of_faces_since_save += 1

And that’s it aside from a line or two of clean up code to turn off the camera when the program exits.

The start-up code for the program is at the very bottom of the program:

if __name__ == "__main__":

load_known_faces()

main_loop()

All we are doing is loading the known faces (if any) and then starting the main loop that reads from the camera forever and displays the results on the screen.

The whole program is only about 200 lines, but it does something pretty interesting — it detects visitors, identifies them and tracks every single time they have come back to your door. It’s a fun demo, but it could also be really creepy if you abuse it.

Fun fact: This kind of face tracking code is running inside many street and bus station advertisements to track who is looking at ads and for how long. That might have sounded far fetched to you before, but you just build the same thing for $150!

Extending the Program

This program is an example of how you can use a small amount of Python 3 code running on a $100 Jetson Nano board to build a powerful system.

If you wanted to turn this into a real doorbell camera system, you could add the ability for the system to send you a text message using Twilio whenever it detects a new person at the door instead of just showing it on your monitor. Or you might try replacing the simple in-memory face database with a real database.

You can also try to warp this program into something entirely different. The pattern of reading a frame of video, looking for something in the image, and then taking an action is the basis of all kinds of computer vision systems. Try changing the code and see what you can come up with! How about making it play yourself custom theme music whenever you get home and walk up to your own door? You can check out some of the other face_recognition Python examples to see how you might do something like this.

Learn More about the Nvidia Jetson Platform

If you want to learn more about building stuff with the Nvidia Jetson hardware platform, there’s a website called JetsonHacks that publishes tips and tutorials. I recommend checking them out. I’ve found a few tips there myself.

If you want to learn more about building ML and AI systems with Python in general, check out my other articles and my book on my website.