This tutorial demonstrate how to use voice recognition on the Raspberry Pi. By the end of this demonstration, we should have a working application that understand and answers your oral question.
This is going to be a simple and easy project because we have a few free API available for all the goals we want to achieve. It basically converts our spoken question into to text, process the query and return the answer, and finally turn the answer from text to speech. I will divide this demonstration into four parts:
Result Example:
This has been a very popular topic since Raspberry Pi came out. With the help of this tutorial, it should be quite easily achieved. I actually having an idea of combining the Speech recognition ability on the Raspberry Pi with the powerful digital/analog i/o hardware, to build a useful voice control system, which could also be adopted in Robotics and Home Automation. This will be in the next couple of blog posts.
Hardware and PreparationYou can use an USB Microphone, but I don’t have one so I am using the built-in Mic on my webcam. It worked straight away without any driver installation or configuration.
Of course, the Raspberry Pi as well.
You will also need to have internet connection on your Raspberry Pi.
Speech To TextSpeech recognition can be achieved in many ways on Linux (so on the Raspberry Pi), but personally I think the easiest way is to use Google voice recognition API. I have to say, the accuracy is very good, given I have a strong accent as well. To ensure recording is setup, you first need to make sure ffmpeg is installed:
sudo apt-get install ffmpegTo use the Google’s voice recognition API, I use the following bash script. You can simply copy this and save it as ‘speech2text.sh‘
What it does is, it starts recording and save the audio in a flac file. You can stop the recording by pressing CTRL+C. The audio file is then sent to Google for conversion and text will be returned and saved in a file called “stt.txt”. And the audio file will be deleted.
And to make it executable.
chmod +x speech2text.shTo run it
./speech2text.shThe screen shot shows you some tests I did.
Query ProcessingProcessing the query is just like “Google-ing” a question, but what we want is when we ask a question, only one answer is returned. Wolfram Alpha seems to be a good choice here.
There is a Python interface library for it, which makes our life much easier, but you need to install it first.
Installing Wolframalpha Python LibraryDownload package from https://pypi.python.org/pypi/wolframalpha, unzip it somewhere. And then you need to install setuptool and build the setup.
apt-get install python-setuptools easy_install pipsudo python setup.py buildAnd finally run the setup.
sudo python setup.pyGetting the APP_IDTo get a unique Wolfram Alpha AppID, signup here for a Wolfram Alpha Application ID.
You should now be signed in to the Wolfram Alpha Developer Portal and, on the My Apps tab, click the “Get an AppID” button and fill out the “Get a New AppID” form. Use any Application name and description you like. Click the “Get AppID” button.
Wolfram Alpha Python InterfaceSave this Pyhon script as “queryprocess.py”.
You can test it like this shown in the screen shot below.
Text To SpeechFrom the processed query, we are returned with an answer in text format. What we need to do now is turning the text to audio speech. There are a few options available like Cepstral or Festival, but I chose Google’s speech service due to its excellent quality. Here is a good introductions of these software mentioned.
First of all, to play audio we need to install mplayer:
sudo apt-get install mplayerWe have this simple bash script. It downloads the MP3 file via the URL and plays it. Copy and call it “text2speech.sh“:
And to make it executable.
chmod +x text2speech.shTo test it, you can try
./text2speech.sh "My name is Oscar and I am testing the audio."Google Text To Speech Text Length LimitationAlthough it’s very kind of Google sharing this great service, there is a limit on the length of the message. I think it’s around 100 characters.
To work around this, here is an upgraded bash script that breaks up the text into multiple parts so each part is no longer than 100 characters, and each parts can be played successfully. I modified the original script is from here to fit into our application.
For all of these scripts to work together, we have to call them in a another script. I call this “main.sh“.
I have also updated and removed all the ‘echo’ commands from “speech2text.sh”
Finally, make “main.sh” executable, run it and have silly conversation with your computer
chmod +x text2speech.sh./main.shThe EndThat’s the end of Raspberry Pi Voice Recognition tutorial, but it’s just the beginning of fun! You can now modify this project and turn it into something really cool, let me know what you can come up with. In the next project, I will exploit the speech to text feature, to make a voice control system to control an Arduino board, and even better, a robot.
Have fun.
手机版 | Archiver | 万博网页版登陆页派论坛 ( 粤ICP备15075382号-1 )