A Comprehensive Guide To Object Detection Using YOLO Framework — Part II

Implementation using Python

Cover Image (Source: Author)

In the last part, we understood what YOLO is and how it works. In this section, let us understand how to apply it using pre-trained weights and obtaining the results. This article is greatly inspired by Andrew Ng’s Deep Learning Specialization course. I’ve also tried to gather information from various other articles/resources to make the concept easier to understand.

Now it’s time to implement what we’ve understood using Python. You can do this with the help of a Jupyter Notebook (or any other IDE of your choice). The implementation of YOLO has been taken from Andrew Ng’s Github Repository. You’ll also have to download this zip file which contains the pre-trained weights and packages to implement YOLO. Here’s a link to my GitHub repository where you can find the Jupyter Notebook.

I’ve tried to comment on as many lines of code as possible for better understanding.

Importing Libraries:

Let us first import all the required libraries.

import os

import imageio

import matplotlib.pyplot as plt

from matplotlib.pyplot import imshow

import scipy.io

import scipy.misc

import numpy as np

import pandas as pd

import PIL

import tensorflow as tf

from skimage.transform import resize

from keras import backend as K

from keras.layers import Input, Lambda, Conv2D

from keras.models import load_model, Model

from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image,draw_boxes, scale_boxes

from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

Applying Filter:

First, we are going to apply a filter by thresholding. We can do this by getting rid of those boxes which have a score less than the chosen threshold.

The model contains 80 different classes for detection. It gives a total of 19x19x5x85 numbers where:

19x19: the shape of the grid

5: number of anchor boxes

85: each box containing 85 numbers (Pc, bx, by, bh, bw, c1,c2…..c80)

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6): '''

box confidence: tensor of shape (19,19,5,1) containing Pc

boxes: tensor of shape (19,19,5,4)

box_class_probs: tensor of shape (19,19,5,80)

threshold: if Pc<threshold, get rid of that box

'''

#Computing box scores

box_scores = box_confidence*box_class_probs #Finding the index of the class with maximum box score

box_classes = K.argmax(box_scores, -1) #Getting the corresponding box score

box_class_scores = K.max(box_scores,-1) #Creating a filtering mask. The mask will be true for all the boxes we intend to keep (pc >= threshold) and false for the rest

filtering_mask = box_class_scores>threshold #Applying the mask to scores, boxes and classes

scores = tf.boolean_mask(box_class_scores, filtering_mask)

boxes = tf.boolean_mask(boxes, filtering_mask)

classes = tf.boolean_mask(box_classes, filtering_mask) '''

scores: contains class probability score for the selected boxes

boxes: contains (bx,by,bh,bw) coordinates of selected boxes

classes: contains the index of class detected by the selected boxes

'''

return scores, boxes, classes

Implementing Intersection Over Union (IoU):

Now we are going to implement IoU. This will be used to evaluate the bounding boxes.

Intersection over Union (Edited by Author)

We will be defining a box using its two corners (upper left and lower right). The coordinates can be named as (x1,y1,x2,y2).

We will also have to find out the coordinates of the intersection of two boxes.

xi1: maximum of the x1 coordinates of the two boxes.

yi1: maximum of the y1 coordinates of the two boxes.

xi2: minimum of the x2 coordinates of the two boxes.

yi2: minimum of the y2 coordinates of the two boxes.

The area of the rectangle formed after intersection can be calculated using the formula: (xi2 — xi1)*(yi2 — yi1)

The formula for finding IoU is:

(Intersection area)/(Union area)

Now let us define a function to calculate IoU.

def iou(box1, box2): #Calculating (xi1,yi1,xi2,yi2) of the intersection of box1 and box2

xi1 = max(box1[0], box2[0])

yi1 = max(box1[1], box2[1])

xi2 = min(box1[2], box2[2])

yi2 = min(box1[3], box2[3])

#Calculating the area of intersection

inter_area = (yi2-yi1)*(xi2-xi1) #Calculating the areas of box1 and box2 using the same formula

box1_area = (box1[3] - box1[1])*(box1[2] - box1[0])

box2_area = (box2[3] - box2[1])*(box2[2] - box2[0])

#Calculating the union area by using the formula: union(A,B) = A+B-Inter(A,B)

union_area = box1_area + box2_area - inter_area #Calculating iou

iou = inter_area/union_area



return iou

Implementing Non-Max Suppression:

Next, we will be implementing non-max suppression to remove all the duplicate bounding boxes for the same object. The steps involved are:

Select the box with the highest score. Compute its IoU with all other boxes and remove those boxes which have IoU greater than the threshold mentioned. Repeat until there are no more boxes with a lower score than the selected box.

Let us define the function

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5): #tensor used in tf.image.non_max_suppression()of size 'max_boxes'

max_boxes_tensor = K.variable(max_boxes, dtype = 'int32') #initiating the tensor

K.get_session().run(tf.variables_initializer([max_boxes_tensor])) #Using the tensorflow function tf.image.non_max_suppression to get the indices of boxes kept

nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold) #Using K.gather to individually access scores, boxes and classes from nms_indices

scores = K.gather(scores, nms_indices)

boxes = K.gather(boxes, nms_indices)

classes = K.gather(classes, nms_indices)



return scores, boxes, classes

Calling Functions Defined Above:

Now it’s time to implement a function that takes the output of deep CNN and then filters the boxes using the above functions.

Note that there are a few ways by which a bounding box can be represented i.e via their corners or their midpoints and height/width. YOLO converts between a few such formats for which there is a function named “yolo_boxes_to_corners”.

Also, YOLO was trained on images of 608 x 608 dimensions. If the images we provide have a dimension greater than or less than the original dimension (on which YOLO was trained) then we will have to rescale the bounding boxes accordingly to fit on the image. We will be using a function called “scale_boxes” for this purpose.

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes = 10, score_threshold = .6, iou_threshold = .5): '''

yolo_outputs contains:

box_confidence, box_xy, box_wh, box_class_probs

''' #Retrieving output

box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs #Converting the boxes for filtering functions

boxes = yolo_boxes_to_corners(box_xy, box_wh) #Using the function defined before to remove boxes with less confidence score

scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = score_threshold) #Scaling the boxes

boxes = scale_boxes(boxes, image_shape) #Using the function defined before for non-max suppression

scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)



return scores, boxes, classes

Loading Pre-Trained Model:

Now we’re going to test the YOLO pre-trained models on images. For this, we have to create a session. Also, remember that we’re trying to detect 80 classes and using 5 anchor boxes. We have all the class information in “coco_classes.txt” and “yolo_anchors.txt” which must be present in the zip file you downloaded before inside the folder “model_data”.

The training of the YOLO model takes a long time especially if you don’t have a high spec system. So we are going to load an existing pre-trained Keras YOLO model stored in “yolo.h5”. These are the pre-trained weights from the YOLOv2 model.

Let's create a session and load these files.

sess = K.get_session()

class_names = read_classes("model_data/coco_classes.txt")

anchors = read_anchors("model_data/yolo_anchors.txt")

yolo_model = load_model("model_data/yolo.h5")

Note: In some cases, a warning pops up while loading the weights. If that’s the case then just ignore the warning.

#Converting the output of model into usable bounding box tensors

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

#Filtering the boxes

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

So far we have created a session graph that is given to yolo_model to compute output, processed by yolo_head, and goes through a filtering function yolo_eval.

Applying YOLO on an Image:

Now we have to implement a function that runs the graph to test YOLO on an image.

def predict(sess, image_file): #Preprocessing the image

image, image_data = preprocess_image("images/"+image_file, model_image_size = (608,608)) #Running the session and feeding the input to it

out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],feed_dict = {yolo_model.input: image_data, K.learning_phase(): 0}) #Prints the predicted information

print('Found {} boxes for {}'.format(len(out_boxes), image_file)) #Generates color for drawing bounding boxes

colors = generate_colors(class_names) #Draws bounding boxes on the image file

draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors) #Saving the predicted bounding box on the image

image.save(os.path.join("out", image_file), quality = 150) #Displaying the results in notebook

output_image = imageio.imread(os.path.join("out", image_file))

plt.figure(figsize=(12,12))

imshow(output_image) return out_scores, out_boxes, out_classes

Run the following cell on your test image to see the results.

#Loading the image

img = plt.imread('images/traffic.jpeg') #Calculating the size of image and passing it as a parameter to yolo_eval

image_shape = float(img.shape[0]),float(img.shape[1])

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape) #Predicts the output

out_scores, out_boxes, out_classes = predict(sess, "traffic.jpeg")

The output is:

Output after feeding image to the model.

Conclusion:

Thanks a lot if you’ve made this far. Please do note that the results may or may not be the same if you use the same image for detection. You can further customize the maximum number of bounding boxes per image, threshold values, etc. to obtain better results.

If you have any suggestions to make this blog better, please do mention in the comments. I will try to make the changes.

References: