With the introduction out of the way, let’s start coding. Create a file track.py in your working directory and write the following lines there. They’ll import and initiate everything we’ll need.

import cv2

import numpy face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

Although we will be tracking eyes on a video eventually, we’ll start with an image since it’s much faster, and the code that works on a picture will work on a video, because any video is just N pictures(frames) per second. So, download a portrait somewhere or use your own photo for that. I’ll be using a stock picture.

The stock image I’m using

Once it’s in your working directory, add the following line to your code:

img = cv2.imread(“your_image_name.jpg”)

In object detection, there’s a simple rule: from big to small. Meaning you don’t start with detecting eyes on a picture, you start with detecting faces. Then you proceed to eyes, pupils and so on. It saves a lot of computational power and makes the process much faster. Also it saves us from potential false detections.

To detect faces on a picture, we first need to make it gray. Then we’ll detect faces.

gray_picture = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)#make picture gray

faces = face_cascade.detectMultiScale(gray_picture, 1.3, 5)

Faces object is just an array with small sub arrays consisting of four numbers. They are X, Y, width and height of the detected face. For example, it might be something like this:

[[356 87 212 212]

[ 50 88 207 207]]

It would mean that there are two faces on the image. 212x212 and 207x207 are their sizes and (356,87) and (50, 88) are their coordinates.

To see if it works for us, we’ll draw a rectangle at (X, Y) of width and height size:

for (x,y,w,h) in faces:

cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,0),2)

Those lines draw rectangles on our image with (255, 255, 0) color in RGB space and contour thickness of 2 pixels.

Now we can display the result by adding the following lines at the very end of our file:

cv2.imshow('my image',img)

cv2.waitKey(0)

cv2.destroyAllWindows()

Now that we’ve confirmed everything works, we can continue. We’ll detect eyes the same way. But on the face frame now, not the whole picture. Under the cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,0),2) line add:

gray_face = gray_picture[y:y+h, x:x+w] # cut the gray face frame out

face = img[y:y+h, x:x+w] # cut the face frame out

eyes = eye_cascade.detectMultiScale(gray_face)

The eyes object is just like faces object — it contains X, Y, width and height of the eyes’ frames. You can display it in a similar fashion:

for (ex,ey,ew,eh) in eyes:

cv2.rectangle(face,(ex,ey),(ex+ew,ey+eh),(0,225,255),2)

Notice that although we detect everything on grayscale images, we draw the lines on the colored ones. The sizes match, so it’s not an issue. We’ll use this principle of detecting objects on one picture, but drawing them on another later.

Looks like we’ve ran into trouble for the first time:

Is this chin really an eye?

Our detector thinks the chin is an eye too, for some reason. What can be done here? Of course, you could gather some faces around the internet and train the model to be more proficient. But there’s another, the Computer Vision way.

If you think about it, eyes are always in the top half of your face frame. I don’t think anyone has ever seen a person with their eyes at the bottom of their face. So, when going over our detected objects, we can simply filter out those that can’t exist according to the nature of our object. Like with eyes, we know they can’t be in the bottom half of the face, so we just filter out any eye whose Y coordinate is more than half the face frame’s Y height.

Eyes can’t really be in the lower half of your frame

We’ll put everything in a separate function called detect_eyes:

def detect_eyes(img, img_gray, classifier):

coords = cascade.detectMultiScale(img_gray, 1.3, 5)# detect eyes

height = np.size(image, 0) # get face frame height

for (x, y, w, h) in coords:

if y+h > height/2: # pass if the eye is at the bottom

pass

We’ll leave it like that for now, because for future purposes we’ll also have to return left and right eye separately. OpenCV can put them in any order when detecting them, so it’s better to determine what side an eye belongs to using our coordinate analysis. If the eye’s center is in the left part of the image, it’s the left eye and vice-versa.

We’ll cut the image in two by introducing the width variable:

def detect_eyes(img, classifier):

gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

eyes = cascade.detectMultiScale(gray_frame, 1.3, 5) # detect eyes

width = np.size(image, 1) # get face frame width

height = np.size(image, 0) # get face frame height

for (x, y, w, h) in eyes:

if y > height / 2:

pass

eyecenter = x + w / 2 # get the eye center

if eyecenter < width * 0.5:

left_eye = img[y:y + h, x:x + w]

else:

right_eye = img[y:y + h, x:x + w]

return left_eye, right_eye

But what if no eyes are detected? Then the program will crash, because the function is trying to return left_eye and right_eye variables which haven’t been defined. So to avoid that, we’ll add two lines that pre-define our left and right eyes variables:

def detect_eyes(img, classifier):

gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

eyes = cascade.detectMultiScale(gray_frame, 1.3, 5) # detect eyes

width = np.size(image, 1) # get face frame width

height = np.size(image, 0) # get face frame height

left_eye = None

right_eye = None

for (x, y, w, h) in coords:

....

Now, if an eye isn’t detected for some reason, it’ll return None for that eye.

Before we jump to the next section, pupil tracking, let’s quickly put our face detection algorithm into a function too. It’s nothing difficult compared to our eye procedure. I’ll just note that false detections happen for faces too, and the best filter in that case is the size. Usually some small objects in the background tend to be considered faces by the algorithm, so to filter them out we’ll return only the biggest detected face frame:

def detect_faces(img, classifier):

gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

coords = cascade.detectMultiScale(gray_frame, 1.3, 5)

if len(coords) > 1:

biggest = (0, 0, 0, 0)

for i in coords:

if i[3] > biggest[3]:

biggest = i

biggest = np.array([i], np.int32)

elif len(coords) == 1:

biggest = coords

else:

return None

for (x, y, w, h) in biggest:

frame = img[y:y + h, x:x + w]

return frame

Also notice how we once again detect everything on a gray picture, but work with the colored one.