Let’s say you want to make an iOS app that can recognize audio signals in real time, followed by the triggering of an action. From a global point of view, we’d need to explore the crossroads of machine learning and and mobile development.

For the recognition of sound signals, it’s first necessary to collect data with which to train a machine learning model. Luckily, I found a very relevant dataset: a collection of 50 types of environmental sounds.

For each class or type of sound, there are 40 records of 5 seconds each. As a result, 2000 records labeled with their membership class are available for training our model. The fact that the data is qualified—that is, information on their category is available—is very important. The model will be based on this knowledge and will learn to distinguish the different categories.

However, before we can train the model, we have to go through the feature engineering stage. This step involves transforming the data to simplify it and remove redundant information. This and the structure of the model which follows make it possible to generalize the result of the learning.

Indeed, the goal is not to have a model that knows the training data by heart, but, on the contrary, to have a model sufficiently independent of the training data that’s able to recognize a sound it’s never heard before. Feature engineering is often the key to success in this endeavor.

In the field of signal processing, and also specifically for sound signal processing, a common practice is to use frequency transformation for feature extraction. The frequency makes it possible, for example, to distinguish a bass sound, characterized by low frequencies, from an acute sound, characterized by high frequencies.