Before Convolutional Neural Networks took the world by storm, the term convolution has been around since a extremely long time.

People have been using convolution operation in mathematics, signal processing, systems theory, image processing and many other applications.

Let’s start with the mathematical definition of convolution for two functions f(t) and g(t) of continuous variable t.

The interpretation of the equation is you flip one of the signals and then shift the signal from to while calculating the area under the curve of the product of the two signals. Following animation from Wikipedia perfectly explains what is happening.

The blue and red graphs are the two signals and the black is the convolved signal. The convolution is zero when there is no overlap. The convolution output increases linearly, reaches a peak when the two signals overlap perfectly and then decreases to zero. Similar definition exists for discrete signals or functions with summations in the formula.

In systems theory, for Linear Time Invariant systems the output of a system is defined as the convolution of the input signal with the impulse response of the system. This allows for the analysis of complex systems in a systematic manner.

I want to give an example which blew my mind. If you were to fire a gun (which acts like an impulse for the system) in an empty auditorium and record the sound you would have the impulse response of the empty room. Then if you convolve this signal with any other recorded sound, you would hear the sound with the effect of an empty auditorium. How cool is that! Here is an example clip. The first sound is ‘a’, second is ‘impulse response’ and third is the convolved output.

Convolution Audio Impulse Response Example

The code for doing this is as follows:

#Import the required libraries import IPython.display as ipd from scipy.io import wavfile from scipy.signal import convolve #Read the audio file and impulse response file fs1, impulse_response = wavfile.read('./ImpulseResponses/S1R2_sweep4000.wav') fs2, audio_file = wavfile.read('./SoundFiles/A_garvis.wav') #Convolve the two signals y = convolve(audio_file.astype('float32'),impulse_response.astype('float32')) #Display the audio widget for jupyter lab ipd.Audio(data=audio_file,rate=fs2) ipd.Audio(data=impulse_response,rate=fs1) ipd.Audio(data=y,rate=fs1) #Write the file by applying appropriate scaling and type casting wavfile.write('./SoundFiles/Output.wav',fs2,np.int16(y[0]/np.max(np.abs(y[0])) * 32767))

Now moving on to the Image Processing application of 2D convolution. The 2D convolution equation for continuous and discrete functions is as follows.

This is equivalent to performing two 1D convolutions along the x and y axis. Visually the process is explained in following graphic.

In image processing the smaller matrix is called a kernel or filter which is applied to the image via convolution. Different kernels have different effects. A lot of time and researching was done to come up with filters that perform specific tasks. These kernels have odd dimensions because that allows for one central pixel.

Following are few examples of kernels and their outputs. I’m using the iconic cameraman image for those of you who have an image processing background.

Let us start with a 3×3 average filter kernel. It looks as follows.

So when you convolve it across an image, all the pixels in the 3×3 window around the central pixel are added and the result is divided by 9 which is nothing but averaging operation. So should expect a blurred output from these filters. And larger the kernel size more the blurring. Here are some exemplary kernel outputs.

As can be seen from the images the 25×25 kernel does the maximum smoothing.

Next let’s look at the Sobel filters. These are helpful in detecting edges. There are two filters one for the x direction and the other for the y. One is the transpose of the other.

And following is the result of application of these filters on the cameraman image.

It does a pretty amazing job at detecting edges! And if you add the two filter responses together you get edges in both the directions.

Next let’s look at the Laplacian kernel. As you might remember from calculus Laplacian is a differential operator which is given by the divergence of the gradient of a function. The kernel is a discrete approximation.





It detects edges in both directions.

You must have heard the term Gaussian or the “bell shaped” curve at least once. The 2D plot of Gaussian function is as follows.

It indeed looks like a bell and the interesting thing about this function in terms of image processing is that it averages the pixels around the central pixel with weights decreasing as you move away from the center. So more importance is given to the central pixel and its neighbors and lower importance to the farther ones. Few exemplary kernels are shown below.

I’ll show you the output for the 3×3 filter. In comparison to the average filter the smoothing from this filter is better.

The tutorial files are available on Github:

https://github.com/msminhas93/ConvolutionTutorial

As you see filtering using kernels gives you useful information about the image. It lets you see beyond the image. However, these are fixed filters having a fixed purpose/output.

What would happen if you could learn not just one but stacks of filters for your application area and make decisions based on features extracted from them? That would be amazing, wouldn’t it?

Turns out Convolutional Neural Networks let you do just this and that is why they are so effective for image processing applications. However the operation being done in these networks is correlation (there is no flipping). But convolution sounds much cooler doesn’t it?

Hopefully after reading this article you no longer feel convolutions are convoluted! Thank you for reading the post. If you liked it please do subscribe and share!

Following is a link that allows you to see the output of different filters in both 1D and 2D.

https://graphics.stanford.edu/courses/cs178/applets/convolution.html