In the first post of this three-part series, I listed four points that I hope my readers will agree with at the end of this series. The second post addressed the first two points of the four. In this post, Part Three of the series, I will demonstrate the final two points:

Phase distortions generally have less effect on human perception than magnitude distortions; and Two audio clips can be recognized by humans as matching despite having dramatically different spectrograms.

In Part One I explained the concept of a spectrogram and how it is computed using the DFT. In Part Two we looked at the effect of distortions on human aural perception. We found that in some cases phase distortions change the time domain waveform but have no effect on our perception. In other cases, phase distortions clearly affect the audio, but the distortions have no impact on our ability to easily recognize a clip. In this final part, we will look at the effect of spectral magnitude distortions. Unlike the case with phase distortions, magnitude distortions change the spectrogram.

Let’s begin with the clip x03.wav first introduced in Part One. Recall that it is a sum of five sinusoids, with frequencies of the five sinusoids, in Hz, are {500 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 2500 Hz}. The magnitudes of the five sinusoids are {1000, 2000, 750, 1000, 1500}. The waveform x03.wav was formed from the following sum:

x03(n)=\displaystyle\sum_{l=1}^{5}a_{l}\cos(2 \pi f_{l}nT+\theta_{l})

where the sampling rate is 48 kHz (T=1/48,000) and the five phase values f are {0, 0, 0, 0, 0}. What happens if we change the magnitude values to the five randomly chosen values {427, 716, 2113, 1382, 373}? The result is the file x05.wav. Waveforms for both x03.wav and x05.wav are shown below: