Start audio capture

We need to use the AudioEngine framework, which is a part of AVFoundation , to start capturing audio streams from the microphone.

By accessing the inputNode from the audioEngine , we can install an audio tap on the bus to observe the audio stream outputs from the node.

private func startAudioEngine() { //create stream analyzer request with the Sound Classifier do{

try audioEngine.start()

}

catch( _){

print("error in starting the Audio Engine")

} }

In the above code, we’ll be adding the sound classification request and sound analyzers on the input node bus.

Create a sound stream analyzer

Next, we need to create an audio stream analyzer using the SoundAnalysis framework. It captures the audio engine’s streams in the native format, as shown below:

inputFormat = audioEngine.inputNode.inputFormat(forBus: 0)

analyzer = SNAudioStreamAnalyzer(format: inputFormat)

Create a sound classifier request

Now, we need to create a sound classifier request by passing the Core ML model instance in the SNClassifySoundRequest .

Add the following code at the start of your startAudioEngine function:

do {

let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)

try analyzer.add(request, withObserver: resultsObserver)

} catch {

print("Unable to prepare request: \(error.localizedDescription)")

return

}

}

In the above code, we’ve added the sound classification request instance to the SNAudioStreamAnalyzer . The classifier ultimately returns the results to the observer object resultsObserver .

This class’s instance implements the SNResultsObserving protocol method, which gets triggered for every sound classification result, as shown below:

GenderClassifierDelegate is a custom protocol that’s used to display the final predictions in a UILabel:

protocol GenderClassifierDelegate {

func displayPredictionResult(identifier: String, confidence: Double)

} extension ViewController: GenderClassifierDelegate {

func displayPredictionResult(identifier: String, confidence: Double) {

DispatchQueue.main.async {

self.transcribedText.text = ("Recognition: \(identifier)

Confidence \(confidence)")

}

}

}

Analyzing audio streams

Finally, we can start analyzing the audio streams by setting up a separate serial dispatch queue that analyzes the audio buffer from the inputNode , as shown in the code below:

Run audio engine and make predictions

Finally, we can run our audio capture method and analyze and classify the audio buffer streams to either of the gender class labels.

Just add the following code snippets in your viewDidLoad and viewDidAppear methods:

override func viewDidLoad() {

super.viewDidLoad()



resultsObserver.delegate = self

inputFormat = audioEngine.inputNode.inputFormat(forBus: 0)

analyzer = SNAudioStreamAnalyzer(format: inputFormat)

buildUI()

}

override func viewDidAppear(_ animated: Bool) {

startAudioEngine()

}

In order to test this application, I’ve used two videos, a WWDC one(for the male voice) along with a video from youtube(to recognize female voice). Here’s the result we’ve got:

Our result from the small dataset we used for gender sound classification

Based on the small dataset of 100 odd files in our model the above result is quite accurate. Our sound classification model by default returns “male” for miscellaneous sounds. You can try adding more sound class labels for different categories(environments, bots, birds) and build your own dataset.