We notice that the values can either be positive or negative; how is this so? Well, a sound is a pressure wave, and relative to some baseline, the pressure can be higher or lower. More information here.

Let’s take a look at the documentation for vDSP underneath Vector Reduction >Vector Average Calculation. Because after all, we’re dealing with a vector of information and we’re trying to find some sort of average loudness of our current audio buffer to display to the user.

So which one do we pick?

This a little complicated. The best option we have in the vDSP framework is the root means squared calculation. This makes sense since the RMS is used to calculate the average of a function that goes above and below the x-axis. It also turns out that in practice, this is a very good and well-used measurement technique for loudness.

For the sake of completeness, I’ll mention that there are more advanced ways to do loudness metering, but by now I hope you can appreciate and understand why we are not overly concerned with that level of detail. If you’re looking for the most accurate ways to measure this, you can look into A-Weighting which will put a heavier emphasis on the frequencies our ears can hear better.

Let’s do it and see what we get!

We’ll now be going into our SignalProcessing class.

We’ll want to import Accelerate to take advantage of the vDSP library. Next, let’s look at the RMS function, vDSP_rmsqv .

func vDSP_rmsqv(_ __A: UnsafePointer<Float>, _ __IA: vDSP_Stride, _ __C: UnsafeMutablePointer<Float>, _ __N: vDSP_Length)

A: (very descriptive, I know) A pointer to our data, defined as a single-precision real input vector. Single-precision meaning it is a Float; real input meaning the values are real and not complex.

IA: The stride of our buffer data. The stride is the distance between discrete values of data in memory (in some sort of container such as an array, vector etc…) measured in some sort of unit. In our case, vDSP_Stride is a unit stride, meaning it will move x bytes in memory — where x denotes the size of the value type in the container. vDSP_Stride is just a type-alias for Int.

C: A pointer to a Float where we would like to write the result of the operation.

N: The length of the data we’d like to do the operation over. This is just a type-alias for UInt.

Now, we can create a function inside our SignaProcessing class to return the RMS value given a container of Floats and the number of values we’re interested in computing over.

If we go back into our ViewController’s processAudioData function, we can get the effective loudness for the current audio sample.

Okay, so we have the value for the sample — now what? These values are rather small. If you recall the way the points work in metal, the X and Y axis scale from -1 to 1. We first need to establish a baseline, or rather, how large the circle is going to be at 0 loudness, and what is the maximum it can be? If we think about this from a UI perspective, we don’t want the circle to fill up the whole MetalView; there needs to be space for the frequency lines. We’ll be going with a minimum of 0.3 and a maximum of 0.6.

We need to normalize these values to fit between 0.3 and 0.6. If we use a baseline of 0.3 radius for our circle, then a naive approach is 0.3 + rmsValue; but as we can see, these values are too small to make a noticeable difference, so we need to scale them up.

First, let’s turn the result into decibels ( 10*log10f(val) ), an easier unit to work with and understand.

We’ll now get values ranging from -160dB (silent) to 0dB(loudest). To adjust this to a +0.3 scale rather than a -160 scale, some simple arithmetic is needed.

But now we notice another problem: our values all seem to hover around 5.27–5.28. That doesn’t seem too impressive. What can we do to accentuate these small changes? Well, we can choose to magnify a certain range that our loudness seems to be constrained by.

Add a print statement and look a the outputs!

The next problem to tackle is the sampling rate. We get a callback every 0.1s. A10fps loudness meter doesn’t look very impressive, especially if you consider the fact that we’re just rendering a circle of different sizes ten times. It’s going to look choppy and not smooth at all. So what can we do to fill up the gaps? Good question. We can do some interpolation between the points to smooth it out! Linear interpolation is more than fine in this case.

I’m going to zoom through this and just paste the code. Essentially, all we need to do is store the prevRMSValue in the controller class, and to call an interpolation function in our signalProcessing class using our previous and current rms.

Inside the ViewController class.

Inside the SignalProcessing class.

For now, the results will be going unused, since they don’t matter until the next section.

Section 2.2: Frequency metering of the signal

First, we need to understand what frequency means and how we can get the magnitudes of discrete frequency bins to represent the size of our lines. What is the magnitude of a frequency bin? It’s the amount of energy around that frequency. We’ll cover this concept shortly. First, we’ll take a look at the audio signal, then how a Fourier Transform works, and finally, how we can implement fast fourier transforms (FFT), which literally just means a fast way, algorithmically, to compute fourier transforms.

Sound travels as a wave. Increasing the amplitude of the wave increases the loudness of the sound by increasing the energy of the wave, which is why using the amplitudes of our audio sample to compute loudness works. The video below sums up nicely what the frequency we hear (sound) is and how it relates to amplitude.

Now, how do we get the energies of each frequency?

We need to take our buffer data (audio sample) from its representation on the left (in the time domain) to the right (in the frequency domain). Analyzing how a signal looks in the frequency domain, we realize we’re dealing with the same types of values. The height of the lines of each range of frequencies dictates the energy (and the length of the line we’ll be producing for our audio visualizer).

Now let’s get those values!

Going back to the vDSP framework and looking at Vector and Matrix Fourier Transforms we see an option for FFT! We have a lot of options here.

The immediate subsections of functions on the FFT page are grouped as follows:

1D or 2D?

In-Place or Out-of-Place?

Real or Complex?

Looking at our audio sample, it should be clear we’re working in one dimension — that is, we just care about the value on the y-axis (amplitude). In-place or out-of-place refers to the algorithm itself. The third question, real or complex, is more interesting. When we go from the time domain to the frequency domain, we get values on the imaginary axis and the real axis — that is, a complex number. Well, we care about the energy or magnitude of this number. If we choose real, we get values representative of the energy levels. If we choose imaginary, we need to compute the magnitude of the complex results ourselves (a simple function call). The question of whether to disregard the complex part is a question of precision. In our case, it doesn’t matter, but we will go ahead with complex numbers.

Next up, we see more options within each subsection. Single-precision or double-precision? For our purposes, double precision will give us no additional value and just serve to make the function execution longer, so we will go with single precision. Buffer or no buffer? Providing a buffer will give us better performance, so we will use it. Multiple signals? Not for us.

We’ve finally narrowed down to our desired function:

func vDSP_fft_zipt(_ __Setup: FFTSetup, _ __C: UnsafePointer<DSPSplitComplex>, _ __IC: vDSP_Stride, _ __Buffer: UnsafePointer<DSPSplitComplex>, _ __Log2N: vDSP_Length, _ __Direction: FFTDirection)

But wait! The documentation tells us:

Use the DFT routines instead of these wherever possible. (For example, instead of calling vDSP_fft_zip with a setup created with vDSP_create_fftsetup , call vDSP_DFT_Execute(_:_:_:_:_:) with a setup created with vDSP_DFT_zop_CreateSetup(_:_:_:) .)

I’ve actually overlooked these in the past, but the FFT execute functions are much simpler to use so we’ll be going ahead and using vDSP_DFT_Execute(_:_:_:_:_:) instead of the much more verbose functions.

func vDSP_DFT_Execute(_ __Setup: OpaquePointer, _ __Ir: UnsafePointer<Float>, _ __Ii: UnsafePointer<Float>, _ __Or: UnsafeMutablePointer<Float>, _ __Oi: UnsafeMutablePointer<Float>)

These inputs will take some explaining.

First, we have OpaquePointer , aka the setup object for the function, vDSP_DFT_zop_CreateSetup(_:_:_:)

Creates a data structure for use with vDSP_DFT_Execute(_:_:_:_:_:) or vDSP_DCT_Execute(_:_:_:) to perform a complex-to-complex discrete Fourier transform, forward or inverse.

func vDSP_DFT_zop_CreateSetup(_ __Previous: vDSP_DFT_Setup?, _ __Length: vDSP_Length, _ __Direction: vDSP_DFT_Direction

) -> vDSP_DFT_Setup?

For the previous one, we don’t have one so it will be nil.

vDSP_length is a type alias for unsigned long, and represents the number of elements we’ll be transforming. We can use 2¹² (4096) values, but that’s too many resultant bins to draw lines for, so it just won’t look good (it’ll be too crowded). Going with 1024(2¹⁰) elements to transform gives a much more reasonable result. Now as we saw before from the sample size we received for our specific mp3, that’s only a fourth of the buffer data! If you want higher accuracy (not exactly important for us), you can add more values.

After we initialize the setup object, we should also destroy it after we’re done. This isn’t needed as we don’t really have an app lifecycle in this project, so it will be omitted.

In our view controller class, we’ll be importing accelerate and initializing a fftSetup object stored as a class variable:

import Accelerate .... //fft setup object for 1024 values going forward (time-> frequency) let fftSetup = vDSP_DFT_zop_CreateSetup(nil, 1024, vDSP_DFT_Direction.FORWARD)

Next, the FFT takes in two input pointers (immutable) and two output pointers (mutable) for float vectors representing the real and imaginary parts of numbers. So all we need to do is create it, and then run the function! This will be done inside the SignalProcessing class (in a new function) and called from the ViewController class.

Added function inside SignalProcessing.swift

Added function inside ViewController.swift

Okay, let’s stop for a second. I said earlier that the FFT will spit out magnitudes in frequency bins, but how many frequency bins are there, we didn’t specify anything. The number of frequency bins is actually linear with the number of data points (n/2), as per the Nyquist-Shannon Sampling Theorem.

So what frequencies do these bins hold? The way it works: if we use 1024 data points, the sample time for that is ~0.025s or 1/40 ~ 40Hz, meaning that the lowest frequency we can detect is 1*40Hz = 40Hz (first bin) and the highest frequency we can detect is (n/2)*40Hz = 512*40Hz = 20.48Khz (last bin).

Next, we need to get the magnitude. Remember, the magnitudes will denote the energy in the frequency bin. Is there a way to compute complex magnitudes in the vDSP framework? Of course there is. We have two options: compute sqrt(a² + b²) or compute (a²+b²) to save on some computation time. But, for our normalization method, we need the former method.

The function we’ll be using is called vDSP_zvabs and can be found under the Absolute and Negation Functions of the vDSP framework. (The other option is called vDSP_zvmags ). From now on, I’ll be omitting parameter explanations of Accelerate functions.

Inside SignalProcessing.swift

Looking at the values, we see a wide range. And just like we did with the magnitude of the signal before, we need to normalize it. This time around we’re not scaling just one value; we’re scaling 1024 values. So we want to make sure we can do this optimally, which is where vector-scalar operations come into play from the DSP framework. How to choose the scaling factor? There is no easy answer to this.

A common approach online is to divide by the number of samples. After playing around to massage the values in a way we can use them, our scaling factor will be 25.0/512. Note: We’re not trying to make sure it doesn’t go out of bounds, we’re just trying to make sure it looks good (this, of course, is biased to my taste.)

See if you can do this part on your own using vDSP_vsmul .

Now after returning our results, we’re done with part 1 :)

Completed SignalProcessing class