As we’ve learned from the previous tutorial, all we need to detect objects in a video is our YOLOv3 model and our 9 lines of code as seen below:

The code above will detect only the objects in the video and save a new video file with the objects visually identified with bounding boxes. The ImageAI library allows you to retrieve analytical data from each frame and second of a detected video file or live camera feed in real-time.

To obtain this data, we’ll have to define a function that receives a number of parameters, and then parse the name of the function into the .detectObjectsFromVideo function. In the sample below, we’re going to retrieve analytical data from the video for every frame detected.

First, we need import the ImageAI detection class and create a function that will retrieve the analytical data of the detection for each frame processed.

Function for retrieving analytical data for each video frame

In the function above, with the name forFrame , we’re expecting 3 parameters to be sent into it every time a video frame is detected. Then the parameters are printed to the console so that we can observe what’s contained in each one.

Now we’ll include the above code in our normal video detection code with just the difference of setting the parameter value per_frame_function to the name of our function forFrame in the detectObjectsFromVideo function. We’ll be running this code on a 5-second video that you can download via this link. However, you can use any other video for this sample. See the full code below:

Retrieving analytical data for each video frame

Let’s review our detectObjectsFromVideo function. You’ll notice a couple things:

We specified only the input_file_path and not the output_file_path . This is because we don’t intend to save the detected video, and to do this, we set the parameter save_detected_video to False .

and not the . This is because we don’t intend to save the detected video, and to do this, we set the parameter to . To obtain the analytical data and execute our function every time a frame in the video is detected, we specify the parameter per_frame_function to the name of our function, forFrame .

When we run this code, every time a frame in the video is processed for detection, we’ll see the result produced in the console.

Sample of analytical data generated for a each video frame

The above data generated by the detection process and retrieved by our function might look like a maze to you. Allow me to break it down:

i. The first value sent into our function is the frame number, and in the console, we have Frame Number : 1

ii. The second value sent into our function is an array of dictionaries, with each dictionary corresponding to all objects detected in the frame and their properties. Each dictionary contains the name of the object, percentage probability, and box points that define the location of the object in the image frame. Below are samples of 2 dictionaries contained in the data generated:

{‘box_points’: (698, 200, 728, 307), ‘name’: ‘person’, ‘percentage_probability’: 90.71170091629028},

{‘box_points’: (651, 214, 688, 323), ‘name’: ‘person’, ‘percentage_probability’: 97.00838327407837}

iii. The third value sent into our function is the a dictionary of all the objects detected in the frame and the number of instances of each of the object detected.

{‘motorcycle’: 2, ‘bus’: 3, ‘car’: 11, ‘bicycle’: 1, ‘person’: 8}

This data will be sent and our function executed for every frame processed for detection in the video.

We can also obtain analytical data for every second of the video. All we need to do is create a new function and parse the function name into the detectObjectsFromVideo function. See the full code below:

Retrieving analytical data for the last second of a video

In the above code, we created a function with the name forSeconds . Notice that the parameters expected in the function are 4, unlike the forFrame function that has 3. This is because the analytical data sent for every second contains all object dictionaries for each frame, a unique object count for each frame, and an average unique object count for the all the number of frames contain in the last second.

When we run the code above, we should see the following result in the console: