The field of computer vision used to only exist as a discipline of academic research. However, it’s grown beyond its roots and is now an increasingly essential element of artificial intelligence and machine learning. A lot of real world products and solutions like Automated People Counting and Video Surveillance are built using computer vision tools, platforms, and technology.

What is Computer Vision?

From Wikipedia,

Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do

Along with improvements in computer vision research, the problems we aim to solve have also evolved. One particular problem that computer vision works to solve is object detection — detecting objects in an image or a video — preferably in real time.

This problem has resulted in a lot of new neural network architectures like R-CNN, RetinaNet, and YOLO. In this post, we’re going to see how to use the R package image.darknet and a tiny YOLO model for object detection in a given image, in just 3 lines of R code.

What is YOLO?

YOLO (You Only Look Once) is a state-of-the-art object detection architecture. YOLO was first introduced in 2015 by Joseph Redmon et al.

How does YOLO work?

Unlike previous object detection methods that repurpose classifiers to perform detection, YOLO uses a single neural network that predicts bounding boxes and class probabilities directly from full images in one evaluation. Since YOLO makes predictions with a single network evaluation (unlike systems like R-CNN which require thousands for a single image), YOLO is extremely fast—in fact, it’s more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.

YOLO Model

To learn more about how the YOLO model works, check out their paper on arxiv.