UPDATE

This project has been picked up by Make Magazine and Radioshack to create this great step by step guide for their Weekend Project Campaign. Check out the guide here, and the amazingly awesome video below:

I get many requests from people who are still looking for cheap, easy, and fun project ideas for their Raspberry Pi’s, so I wanted to share this translator project I’ve been working on. With very little effort, we can turn this 35$ mini-computer into a feature rich language translator that not only supports voice recognition and native speaker playback, but also is capable of dynamically translating between 1000’s of language pairs, FREE! Even if you are not interested in building this exact translational tool, there are still many parts of this tutorial that might be interesting to you (speech recognition, text to speech, Microsoft/Google translation APIs). Just like the rest of my posts, this one starts with our shopping list. Most of my readers will probably already have most of these items around the house:

Shopping List

*There are definitely cheaper options available for USB Headsets, I chose the logitech as it is plug and play. For alternatives, check this list for verified Raspberry Pi supported sound cards

Assumptions

This tutorial assumes your Raspberry Pi has:

-the latest version of Raspian installed

-an internet connection

-the correct sound card drivers for your headset

Configuring and Testing Your Headset

Before we start writing any code, lets ensure that we can record and playback sound using our USB Headset. The easiest way to do this is with the built in linux commands ‘arecord’ and ‘aplay’. But first lets make sure our file system is up to date.

sudo apt-get update sudo apt-get upgrade sudo apt-get update sudo apt-get upgrade

Now, plug in your USB Headset and run the following commands

cat / proc / asound / cards cat / proc / asound / modules cat /proc/asound/cards cat /proc/asound/modules

You should see that the Logitech Headset is listed as card 1. Additionally, the second command should show that the driver for card 0 (the default raspberry pi output) is snd_bcm2835 and the driver for card 1 (our logitech headset) is snd_usb_audio.

This is a problem because it shows that Raspberry Pi defaults to transmitting sound over its built in hardware, and does not have an audio input device configured. To solve this, we need to update ALSA (Advanced Linux Sound Architecture) to use our Headset as default for audio input and output. This can be done by a quick change to the ALSA config file located in /etc/modprobe.d/alsa-base.conf:

sudo nano / etc / modprobe.d / alsa-base.conf sudo nano /etc/modprobe.d/alsa-base.conf

Near the end of this file, change the line that says

options snd-usb-audio index=-2

to

options snd-usb-audio index=0

Save and close the file and reboot the Raspberry Pi using the following command:

sudo reboot sudo reboot

After the system comes back online, the sound system should be reloaded so that when we rerun the above commands…

cat / proc / asound / cards cat / proc / asound / modules cat /proc/asound/cards cat /proc/asound/modules

…we should see the USB Headset is now the default input/output device (card 0) as shown below.

We can now test this out by recording a 5 second clip from the microphone:

arecord -d 5 -r 48000 daveconroy.wav arecord -d 5 -r 48000 daveconroy.wav

and play it back through the headphone speakers:

aplay daveconroy.wav aplay daveconroy.wav

To adjust the levels you can use the built in utility alsamixer. This tool handles both audio input and output levels.

sudo alsamixer sudo alsamixer

Now that our headset is configured, we can move onto the next step of converting from Speech to Text.

Speech to Text or Speech Recognition with a Raspberry Pi

There are a few options for speech recognition with rPi’s, but I thought the best solution for this tutorial was to use Google’s Speech to Text service. This service allows us to upload the file we just recorded and convert it to text (which we will later use to translate).

Let’s create a shell script to handle this process for us.

sudo nano stt.sh sudo nano stt.sh

with the following contents

echo "Recording your Speech (Ctrl+C to Transcribe)" arecord -D plughw:0,0 -q -f cd -t wav -d 0 -r 16000 | flac - -f --best --sample-rate 16000 -s -o daveconroy.flac; echo "Converting Speech to Text..." wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12 > stt.txt echo "You Said:" value=`cat stt.txt` echo "$value" echo "Recording your Speech (Ctrl+C to Transcribe)" arecord -D plughw:0,0 -q -f cd -t wav -d 0 -r 16000 | flac - -f --best --sample-rate 16000 -s -o daveconroy.flac; echo "Converting Speech to Text..." wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12 > stt.txt echo "You Said:" value=`cat stt.txt` echo "$value"

Make it executable

sudo chmod +x stt.sh sudo chmod +x stt.sh

The last step before we can run the script is to install the FLAC Codec that is not included in the standard Raspian image.

sudo apt-get install flac sudo apt-get install flac

Now we can run the Script

. / stt.sh ./stt.sh

This will automatically start recording your voice, just press Ctrl+C when you are done speaking. At that point the script uploads the sound file to Google, they transcribe it and return it so it can be displayed on our screen. Pretty impressive for only a few lines of code! Sample output below:



Microsoft Translation and Google Text to Speech

Now that we can record our voice and convert it into text, we need to translate it to our desired foreign language. I would love to be able to use Google’s Translate tool for this, but unfortunately there is a 20$ sign up fee for use of this API. I plan on purchasing this for myself, but I wanted to make this project free so every one had an opportunity to try it.

As an alternative, we will be using Microsoft’s translate service which currently is still free for public use. The list of supported languages and their corresponding codes can be found here. In our previous example we used a simple shell script, but for the translation and playback process – I’ve written a more powerful python script.

All of this code can be found on my github repository (contributions welcome!).

Lets first create the file:

sudo nano PiTranslate.py sudo nano PiTranslate.py

and add the following contents

import json import requests import urllib import subprocess import argparse parser = argparse. ArgumentParser ( description = 'This is a demo script by DaveConroy.com.' ) parser . add_argument ( '-o' , '--origin_language' , help = 'Origin Language' , required = True ) parser . add_argument ( '-d' , '--destination_language' , help = 'Destination Language' , required = True ) parser . add_argument ( '-t' , '--text_to_translate' , help = 'Text to Translate' , required = True ) args = parser . parse_args ( ) ## show values ## print ( "Origin: %s" % args. origin_language ) print ( "Destination: %s" % args. destination_language ) print ( "Text: %s" % args. text_to_translate ) text = args. text_to_translate origin_language = args. origin_language destination_language = args. destination_language def speakOriginText ( phrase ) : googleSpeechURL = "http://translate.google.com/translate_tts?tl=" + origin_language + "&q=" + phrase subprocess . call ( [ "mplayer" , googleSpeechURL ] , shell = False , stdout = subprocess . PIPE , stderr = subprocess . PIPE ) def speakDestinationText ( phrase ) : googleSpeechURL = "http://translate.google.com/translate_tts?tl=" + destination_language + "&q=" + phrase print googleSpeechURL subprocess . call ( [ "mplayer" , googleSpeechURL ] , shell = False , stdout = subprocess . PIPE , stderr = subprocess . PIPE ) args = { 'client_id' : '' , #your client id here 'client_secret' : '' , #your azure secret here 'scope' : 'http://api.microsofttranslator.com' , 'grant_type' : 'client_credentials' } oauth_url = 'https://datamarket.accesscontrol.windows.net/v2/OAuth2-13' oauth_junk = json. loads ( requests. post ( oauth_url , data = urllib . urlencode ( args ) ) . content ) translation_args = { 'text' : text , 'to' : destination_language , 'from' : origin_language } headers = { 'Authorization' : 'Bearer ' +oauth_junk [ 'access_token' ] } translation_url = 'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?' translation_result = requests. get ( translation_url+ urllib . urlencode ( translation_args ) , headers = headers ) translation = translation_result. text [ 2 :- 1 ] speakOriginText ( 'Translating ' + translation_args [ "text" ] ) speakDestinationText ( translation ) import json import requests import urllib import subprocess import argparse parser = argparse.ArgumentParser(description='This is a demo script by DaveConroy.com.') parser.add_argument('-o','--origin_language', help='Origin Language',required=True) parser.add_argument('-d','--destination_language', help='Destination Language', required=True) parser.add_argument('-t','--text_to_translate', help='Text to Translate', required=True) args = parser.parse_args() ## show values ## print ("Origin: %s" % args.origin_language ) print ("Destination: %s" % args.destination_language ) print ("Text: %s" % args.text_to_translate ) text = args.text_to_translate origin_language=args.origin_language destination_language=args.destination_language def speakOriginText(phrase): googleSpeechURL = "http://translate.google.com/translate_tts?tl="+ origin_language +"&q=" + phrase subprocess.call(["mplayer",googleSpeechURL], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) def speakDestinationText(phrase): googleSpeechURL = "http://translate.google.com/translate_tts?tl=" + destination_language +"&q=" + phrase print googleSpeechURL subprocess.call(["mplayer",googleSpeechURL], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) args = { 'client_id': '',#your client id here 'client_secret': '',#your azure secret here 'scope': 'http://api.microsofttranslator.com', 'grant_type': 'client_credentials' } oauth_url = 'https://datamarket.accesscontrol.windows.net/v2/OAuth2-13' oauth_junk = json.loads(requests.post(oauth_url,data=urllib.urlencode(args)).content) translation_args = { 'text': text, 'to': destination_language, 'from': origin_language } headers={'Authorization': 'Bearer '+oauth_junk['access_token']} translation_url = 'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?' translation_result = requests.get(translation_url+urllib.urlencode(translation_args),headers=headers) translation=translation_result.text[2:-1] speakOriginText('Translating ' + translation_args["text"]) speakDestinationText(translation)

For the script to run we need to import a few python libraries and a media player.

sudo apt-get install python-pip mplayer sudo pip install requests sudo apt-get install python-pip mplayer sudo pip install requests

The last thing we need to do before we can run the script is sign up for a Microsoft Azure Marketplace API key. To do so, simply visit the marketplace, register an application, and then enter your client id and secret passcode into the script above.

Now we can run the script:

sudo python PiTranslate.py -o en -d es -t "hello my name is david conroy" sudo python PiTranslate.py -o en -d es -t "hello my name is david conroy"

The script has 3 required inputs:

-o orignation language

-d destination language

-t “text to translate”

The above command starts in English and translates to Spanish. My favorite part about the whole tutorial is how quickly you can change between languages you are translating, and how the returned voice changes according to the destination language.

Putting it all Together

It is actually very easy to combine the two scripts we created in this tutorial. In fact, it only takes one line of code to be added to the bottom of stt.sh shell script we created earlier (assuming PiTranslate.py and stt.sh are in the same directory).

sudo nano stt.sh sudo nano stt.sh

python PiTranslate.py -o en -d es -t "$value"

For those of you who skipped around in this tutorial, here is the entire script again with that line added:

echo "Recording your Speech (Ctrl+C to Transcribe)" arecord -D plughw: 0 , 0 -f cd -t wav -d 0 -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o daveconroy.flac; echo "Converting Speech to Text..." wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d \" -f12 > stt.txt echo "You Said:" value = ` cat stt.txt ` echo " $value " #translate from English to Spanish and play over speakers python PiTranslate.py -o en -d es -t " $value " echo "Recording your Speech (Ctrl+C to Transcribe)" arecord -D plughw:0,0 -f cd -t wav -d 0 -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o daveconroy.flac; echo "Converting Speech to Text..." wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12 > stt.txt echo "You Said:" value=`cat stt.txt` echo "$value" #translate from English to Spanish and play over speakers python PiTranslate.py -o en -d es -t "$value"

Now, run the Speech To Text script again, and it will translate it from English to Spanish by default.

. / stt.sh ./stt.sh

Change your origin and destination languages in the last line as desired, and the PiTranslate.py script will do the rest! There are literally 1000’s of language pairs supported here. Here is a screenshot:

Video Demo

I apologize this video is a little shaky, it was difficult holding the headset to the phone while running the scripts.



Known Limitations and Additional Resources

Both the origin and destination languages have to be supported by Microsoft Translate and Google Translate in order for this script to work.

Language Codes:

Microsoft

Google

Some special characters in certain languages will also cause trouble with the translation services, but I am working on a fix for that.

Conclusion

I really enjoyed working on this project as it incorporates a wide range of technology and tools to create something immediately useful and fun to play with. Plus, its all FREE. If you have any questions at all regarding this project, just leave a comment below or on github and I’d be happy to help you!