Hello, my arithmophobic friends! Well, maybe I am wrong and you are able to share my affinity in math, however, as for my experience, many IT guys are trying to avoid any raw math experience, recalling their boring math lectures at universities. However, could it be, that the issue is not in the math itself? Perhaps math curriculum just shouldn’t be considered as an obstacle to overcome? I might be wrong again, and maybe you, my friends, experienced a joyful and fascinating math course during your school time. Well, either your course was indeed truly amazing or you’re just a genius who is able to enjoy the things we, normal people, are struggling with, or maybe even both. Nevertheless, most math courses are quite… to say the least, not-so-breath-taking. At least, in my case, studying in the post-Soviet outdated insulate educational system that was quite true.

“Do not worry about your difficulties in Mathematics. I can assure you mine are still greater.” Albert Einstein

With this in mind, it inspired me to scribble a few lines about something extremely useful and widely applied, but unfortunately widely overlooked. Yes, I am talking about Fourier transform!

Usually, the Fourier transform is represented as a transform of a meaningless sinusoid to a peak, which is even more meaningless. In some cases, a bunch of overlapped sinusoids becomes a bunch of peaks. Well yes, it has a lot of meaning for a signal processing, to split the signal into the discrete frequencies of different amplitudes, for instance. However, the Fourier transform can be applied everywhere, where there is a pattern or simply put it can be applied to check if there is any pattern. On the other hand, “finding a pattern” is something, which you, as a data scientist are trying to do, right? Hmmm… this starts to get quite intriguing!

Fourier transform is widely used not only in signal (radio, acoustic, etc.) processing but also in image analysis eg. edge detection, image filtering, image reconstruction, and image compression. One example: Fourier transform of transmission electron microscopy images helps to check the crystallinity of the samples. Crystallinity — means periodicity, periodicity — means pattern. Fourier transform of your data can expand accessible information about the analyzed sample.

The Math Behind

Mainly, the Fourier transform is represented as an indefinite integral:

For understanding the logic behind the origin of this integral watch an awesome video:

However, indefinite integral presumes indefinite continuous data, which kinda do not exist in the digital world. For this purpose, the classical Fourier transform algorithm can be expressed as a Discrete Fourier transform (DFT), which converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform:

However in a race for the low complexity and algorithm efficiency most likely you would deal with Fast Fourier Transform (FFT) which is a fancy way to speed up the algorithm computation by re-expressing the discrete Fourier transform (DFT) of an arbitrary composite size N = N1N2 in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time from O(N2) to O(N log N):

For more information on FFT with some code examples in Python, I highly recommend the blog post below:

Release the Kernel!

In Immanuel Kant’s words, “Experience without theory is blind, but theory without experience is mere intellectual play”, thus when we are able to see, let’s stop playing and code something. To make the point without many complications let’s take a small sound dataset, first 10 classed from ESC-50, for instance. Why sound? Just because we know that this is a bunch of sinusoidal waves stacked together, what can be more appropriate for our Fun with Fourier?!

The raw waveform signal looks like this:

Now we can feed it to a 1D CNN, to see what we can get:

After training the neural network for 100 epochs we are able to reach about 70% accuracy. Not bad! But what happens if we apply FFT to the signal and then feed it to the CNN.

The most simple way is to use the built-in function numpy.fft.fft in the Numpy library:

However, in this case, we have to duplicate the whole dataset, which is rather small in our case, however, it doesn’t have to always be true. So, in this case, it would be better to use the Lambda function inside a CNN model. Luckily Keras has its own FFT layers:

Built-in FFT, with Fortran in the backend, earns its name as it is really fast, let’s say in an order of microseconds, therefore, FFT computing time can be neglected in comparison with epoch time. Thus we can train CNN without any additional costs. As a result, after 100 epochs using the same test-train split dataset, it was possible to reach 71.7%. Doesn’t seem like much, but when you consider that a single line of code can help you to win a Kaggle competition, it doesn’t seem so insignificant anymore.

Even though the above-mentioned approach gave us some improvement in accuracy, there are few moments to be addressed. First of all, our dataset is quite “unifrequent”, thus the classes differ by frequency and the frequency range within the class is quite narrow and a frequency range of different classes can differ drastically. Therefore, our CNN can recognize the classes, from the perspective of the frequencies, with almost similar accuracy when the net is fed with raw waveform data. Perhaps we would expect better improvement in the accuracy in case if additionally to high-intensity “unifrequent” signal there would be a low-intensity noise, which would disturb the waveform data, but could be nicely distinguished after Fourier Transform. However, applying FFT over the entire signal is kinda odd… As far as the frequency of a real-world audio signal may change over time, we would lose the frequency-time contours of the signal. For this reason, usually, the signal needs to be split the signal into short-time frames with a certain overlap, and then, each of the frames has to be Fourier transformed and consequently, all the transformed frames have to be stacked together… Does someone want to do it manually?! Even using some library?! Naaaah…. We’re here to train a model, not to cut audio files!

Short-time audio frame? Sounds like a convolutional filter, right?! In this case, filter size would be the size of a frame and the stride would be the overlap between the frames. So, we can simply introduce our FFT layer in the intermediate layers of CNN and see what happens:

In this case, we could reach the accuracy of about 77.5%, which sounds much better now. As mentioned above, implementing the Fourier transform for the intermediate layers allowed us to adjust trainable (!) kernels in a more efficient way, and in the end, we got more than 10% of relative accuracy increase.