Voice Command Calculator in Python using speech recognition and PyAudio

Here we are going to build our own voice command calculator in python. So what is a voice command calculator? The name itself is the answer to our question. A calculator calculates operands with the operator. But here we are not gonna take input from the user with the keyboard. We will take input from the user’s voice. For example,

9 + 8 = 17

We can make a calculator using a Python program easily. Just take inputs from the user and print the result.

But here we need to work with speech recognition.

Python Voice Command Calculator

Our goal is like this:

If a user says “nine plus eight” the output will be like this:

9 + 8 17

If a user says “nine divided three” the output will be:

9 divided 3 3.0

Again, if the user says “eight multiplied by seven” the output will be:

8 x 7 56

And so on.

Steps to follow to build a voice command calculator in Python:

Here is the logic:

At first, we will set our microphone device. Accept voice from the user with the mic. Remove noise and distortion from the speech. Convert the speech or voice to text. Now store the text as a string in a variable. Print the string if you wish. ( Not necessary but it will help you determine if the text is all right or not ) split the string into three parts:

first operand, operator and the second operand Now convert the operands to integers. Finally, do the calculation in your program as you got all the things you need.

Let’s implement it in Python:

Requirements to build speech/voice calculator:

We need the following:

SpeechRecognition

PyAudio

Set up things to start our program

You can install those with pip:

pip install SpeechRecognition pip install pyaudio

If you are using Mac then you will need to install postaudio and pyaudio both.

brew install portaudio pip install pyaudio

Linux users can simply download it using:

$ sudo apt-get install python-pyaudio python3-pyaudio

One more thing you must need to know:

Your mic device index.

To learn how to find mic device index follow: Find all the microphone names and device index in Python using PyAudio

Now you are ready to jump into the coding part.

To check if you are all set, your packages are installed successfully just try this below code:

import speech_recognition as sr print("Your speech_recognition version is: "+sr.__version__)

Output:

Your speech_recognition version is: 3.8.1

If this runs with no errors then go to the next part.

In my previous tutorial, I have explained Get voice input with microphone in Python using PyAudio and SpeechRecognition

So in this tutorial, I will not explain those things again. I will only focus on our voice calculator. If you need to know the full explanation just follow my previous tutorial. Here I will provide the code.

Python code to get the voice command from the user:

import speech_recognition as s_r print("Your speech_recognition version is: "+s_r.__version__) r = s_r.Recognizer() my_mic_device = s_r.Microphone(device_index=1) with my_mic_device as source: print("Say what you want to calculate, example: 3 plus 3") r.adjust_for_ambient_noise(source) audio = r.listen(source) my_string=r.recognize_google(audio) print(my_string)

Run the program and it will print whatever you say.

The fun is that. If you say “nine plus ten” it will return a string ” 9 + 10 ”

Note that:

r.adjust_for_ambient_noise(source)

The above line is used to remove the reduce the noise.

r.recognize_google(audio) – This will return the converted text from voice as a string.

You will need an active internet connection to run this program.

( I am using google speech recognition, as right now it is free and we can send the unlimited request. )

But if you are going to create a project or do something bigger with it you should use google cloud speech. Because google speech recognition is running right now for free of cost. But Google does not assure us that the service will never stop.

If everything is fine till now you can go for the next step.

Split the string and make operation:

Here we face the main difficulty. We got a string. For example, “103 – 15”. This is a string so we can’t simply do operation on it. We need to split up the string into three parts and then we will get three separate string.

“103”,”-“,”15”

We need to convert “103” and “15” to int. Those are our operands. And the “+” is our operator.

Use the operator module. This will make our task easy.

import operator def get_operator_fn(op): return { '+' : operator.add, '-' : operator.sub, 'x' : operator.mul, 'divided' :operator.__truediv__, 'Mod' : operator.mod, 'mod' : operator.mod, '^' : operator.xor, }[op] def eval_binary_expr(op1, oper, op2): op1,op2 = int(op1), int(op2) return get_operator_fn(oper)(op1, op2) print(eval_binary_expr(*(my_string.split())))

The sign we wrote in our programs:

+, -, x, divided, etc are operators.

For each operator, we have mentioned a particular method. As you can see, for “divided” => operator.__truediv__,

for Mod or mod ( as during speech to text conversion sometimes it returns capital letter for the first character ) => operator.mod

You can set you own commands too if you wish.

return get_operator_fn(oper)(op1, op2)

This will calculate your result.

So here is the full code of this voice command calculator in Python:

import operator import speech_recognition as s_r print("Your speech_recognition version is: "+s_r.__version__) r = s_r.Recognizer() my_mic_device = s_r.Microphone(device_index=1) with my_mic_device as source: print("Say what you want to calculate, example: 3 plus 3") r.adjust_for_ambient_noise(source) audio = r.listen(source) my_string=r.recognize_google(audio) print(my_string) def get_operator_fn(op): return { '+' : operator.add, '-' : operator.sub, 'x' : operator.mul, 'divided' :operator.__truediv__, 'Mod' : operator.mod, 'mod' : operator.mod, '^' : operator.xor, }[op] def eval_binary_expr(op1, oper, op2): op1,op2 = int(op1), int(op2) return get_operator_fn(oper)(op1, op2) print(eval_binary_expr(*(my_string.split())))

Output:

Your speech_recognition version is: 3.8.1 Say what you want to calculate, example: 3 plus 3 11 + 12 23

To make multiplication simply say ” number1 multiplied by number2″

Here is a screenshot:

for example, say ” 16 multiplied by 10 ”

Multiplied by will be automatically converted to “x” by Google’s speech recognition.

To get mod just say, ” 17 mod 9 ” It will give you the result.

For division just say, “18 divided 7 ”

Here you can see I have not used divided by because google’s speech recognition will not convert that to “/” and we gonna split our strings into three parts. So if we give “number1 divided by number2” it can’t be split up into three parts. “number1”, “divided” “by” “number2” and 4 parts will give us an error because the function can accept only three parameters.

def eval_binary_expr(op1, oper, op2):

If you get check your converted string. I have used print(my_string) this to check if I got my desire string or not.

Please note that:

My audio input ( microphone ) device index is 1. You have to put your device index in your program.

To learn how to find device index check this Find all the microphone names and device index in Python using PyAudio