Why Did We Build AI Based Safety Detection?

In many industrial working environments, for example, mining, power, construction, and forestry, the risk of head injury to workers is constantly present. The most serious risks are physical injuries, which can be as a result of the impact of a falling object or collision with fixed objects at the workplace. Due to the nature of these work activities, it is not always possible to eliminate such risks with just appropriate organisational solutions or collective protective equipment. Therefore, the only way to ensure the safety of workers is by using safety helmets.

1 / 4 • Working without Helmet is risky

Traditional Method

Currently the security department keeps an eye out for the workers without wearing helmet. Making monitoring continuously is extremely difficult and exhausting hence many events are unnoticed and so accidents had happened in the past.

Security keeps monitoring the workers

The Method of Detection

AI based safety detection is an AI device that classifies and detects the workers who are not wearing helmets. The system can run continuously in real-time. The Industries can install this devices across different sections and they will be able to monitor the labours as well as their safety continuously in real-time.

Artificial Intelligence

Deep learning has been a pretty big trend for machine learning lately, and the recent success has paved the way to build project like this. We are going to focus specifically on computer vision and image classification in this sample. To do this, we will be building helmet, workers image classifier using deep learning algorithm, the Convolution Neural Network (CNN) through Caffe Framework.

Also, In this article we will focus on Supervised learning, it requires training on the server as well as deploying on the edge. Our goal is to build a machine learning algorithm that can detect workers and helmets images in real-time, this way you can build your own AI based safety detection classification device.

This project will have 2 sections, the first part is training, which we will be using different sets of workers with helmet image database to train a machine learning algorithm with their corresponding labels. The second part is deploying on the edge, which uses the same model we've trained and running it on an Edge device, in this case Qualcomm Neural Processing SDK.

Deep Learning vs Inference

Convolutional Neural Network

Convolutional Neural Network have wide applications in image and video recognition, recommender systems and natural language processing. In this article, the example that I will take is related to Computer Vision. However, the basic concept remains the same and can be applied to any other use-case!

CNNs, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.

It's operate over Volumes:

3D representation of a RGB image (input)

Convolving an image with a filter

We take the 5*5*3 filter and slide it over the complete image and along the way take the dot product between the filter and chunks of the input image.

Let's convolve a filter with 5x5x3

Once the image is convolved with the Filter, The output of the filter is the convolution of the 32x32x3 input image and 5x5x3 filter.

In this case, There are 28*28 unique positions where the filter can be put on the image

output of the filter with 28x28 unique positions

Convolution Outputs to CNNs

The convolution layer is the successive block of a convolutional neural network.

CL to CNN

Training the Network

Take a look at the filters in the very first layer (these are our 5*5*3 filters). Through back propagation, they have tuned themselves to become blobs of coloured pieces and edges. As we go deeper to other convolution layers, the filters are doing dot products to the input of the previous convolution layers. So, they are taking the smaller coloured pieces or edges and making larger pieces out of them.

Take a look at image 4 and imagine the 28*28*1 grid as a grid of 28*28 neurons. For a particular feature map (the output received on convolving the image with a particular filter is called a feature map), each neuron is connected only to a small chunk of the input image and all the neurons have the same connection weights. So again coming back to the differences between CNN and a neural network.

Pooling Layers

A pooling layer is another building block of a CNN.

Pooling

Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network. Pooling layer operates on each feature map independently.

The most common approach used in pooling is max pooling.

Max Pooling

Typical Architecture of a CNN

Typical architecture of CNN

We have already discussed about convolution layers (denoted by CONV) and pooling layers (denoted by POOL).

RELU is just a non linearity which is applied similar to neural networks.

The FC is the fully connected layer of neurons at the end of CNN. Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular Neural Networks and work in a similar way