QQ登录

只需一步,快速开始

查看: 3766|回复: 0
收起左侧

利用goole voice实现语音识别

[复制链接]
发表于 2013-9-29 16:18:44 | 显示全部楼层 |阅读模式
1
#!/bin/bash

2

3
echo "Recording... Press Ctrl+C to Stop."

4
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

5

6
echo "Processing..."

7
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "[color=blue !important]http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

8

9
echo -n "You Said: "

10
cat stt.txt

11

12
rm file.flac  > /dev/null 2>&1

上面的这些创建一个speech2text.sh的文件, 插上能用语音的摄像头
说英文, 人品大爆发的时候, 真的可以翻译出来

不过多数时候会被墙掉, 悲哀啊。这么好的免费应用

转载自
http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like-siri/


Raspberry Pi Speech Recognition Introduction

This tutorial demonstrate how to use voice recognition on the Raspberry Pi. By the end of this demonstration, we should have a working application that understand and answers your oral question.

This is going to be a simple and easy project because we have a few free API available for all the goals we want to achieve. It basically converts our spoken question into to text, process the query and return the answer, and finally turn the answer from text to speech. I will divide this demonstration into four parts:

  • [size=1em]speech to text
  • [size=1em]query processing
  • [size=1em]text to speech
  • [size=1em]Putting Them Together
  • Result Example:


    Raspberry Pi Voice Recognition For Home Automation

    This has been a very popular topic since Raspberry Pi came out. With the help of this tutorial, it should be quite easily achieved. I actually having an idea of combining the Speech recognition ability on the Raspberry Pi with the powerful digital/analog i/o hardware, to build a useful voice control system, which could also be adopted in Robotics and Home Automation. This will be in the next couple of blog posts.

    Hardware and Preparation

    You can use an USB Microphone, but I don’t have one so I am using the built-in Mic on my webcam. It worked straight away without any driver installation or configuration.

    Of course, the Raspberry Pi as well.

    You will also need to have internet connection on your Raspberry Pi.

    Speech To Text

    Speech recognition can be achieved in many ways on Linux (so on the Raspberry Pi), but personally I think the easiest way is to use Google voice recognition API. I have to say, the accuracy is very good, given I have a strong accent as well. To ensure recording is setup, you first need to make sure ffmpeg is installed:

    sudo apt-get install ffmpeg

    To use the Google’s voice recognition API, I use the following bash script. You can simply copy this and save it as ‘speech2text.sh

    1
    #!/bin/bash

    [backcolor=rgb(248, 248, 248) !important]
    2

    3
    echo "Recording... Press Ctrl+C to Stop."

    [backcolor=rgb(248, 248, 248) !important]
    4
    arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

    5

    [backcolor=rgb(248, 248, 248) !important]
    6
    echo "Processing..."

    7
    wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "[color=blue !important]http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

    [backcolor=rgb(248, 248, 248) !important]
    8

    9
    echo -n "You Said: "

    [backcolor=rgb(248, 248, 248) !important]
    10
    cat stt.txt

    11

    [backcolor=rgb(248, 248, 248) !important]
    12
    rm file.flac  > /dev/null 2>&1



    What it does is, it starts recording and save the audio in a flac file. You can stop the recording by pressing CTRL+C. The audio file is then sent to Google for conversion and text will be returned and saved in a file called “stt.txt”. And the audio file will be deleted.

    And to make it executable.

    chmod +x speech2text.sh

    To run it

    ./speech2text.sh

    The screen shot shows you some tests I did.

    Query Processing

    Processing the query is just like “Google-ing” a question, but what we want is when we ask a question, only one answer is returned. Wolfram Alpha seems to be a good choice here.

    There is a Python interface library for it, which makes our life much easier, but you need to install it first.

    Installing Wolframalpha Python Library

    Download package from https://pypi.python.org/pypi/wolframalpha, unzip it somewhere. And then you need to install setuptool and build the setup.

    apt-get install python-setuptools easy_install pipsudo python setup.py build

    And finally run the setup.

    sudo python setup.pyGetting the APP_ID

    To get a unique Wolfram Alpha AppID, signup here for a Wolfram Alpha Application ID.

    You should now be signed in to the Wolfram Alpha Developer Portal and, on the My Apps tab, click the “Get an AppID” button and fill out the “Get a New AppID” form. Use any Application name and description you like. Click the “Get AppID” button.

    Wolfram Alpha Python Interface

    Save this Pyhon script as “queryprocess.py”.

    1
    #!/usr/bin/python

    [backcolor=rgb(248, 248, 248) !important]
    2

    3
    import wolframalpha

    [backcolor=rgb(248, 248, 248) !important]
    4
    import sys

    5

    [backcolor=rgb(248, 248, 248) !important]
    6
    # Get a free API key here [color=#0820 !important]http://products.wolframalpha.com/api/

    7
    # This is a fake ID, go and get your own, instructions on my blog.

    [backcolor=rgb(248, 248, 248) !important]
    8
    app_id='HYO4TL-A9QOUALOPX'

    9

    [backcolor=rgb(248, 248, 248) !important]
    10
    client = wolframalpha.Client(app_id)

    11

    [backcolor=rgb(248, 248, 248) !important]
    12
    query = ' '.join(sys.argv[1:])

    13
    res = client.query(query)

    [backcolor=rgb(248, 248, 248) !important]
    14

    15
    if len(res.pods) > 0:

    [backcolor=rgb(248, 248, 248) !important]
    16
        texts = ""

    17
        pod = res.pods[1]

    [backcolor=rgb(248, 248, 248) !important]
    18
        if pod.text:

    19
            texts = pod.text

    [backcolor=rgb(248, 248, 248) !important]
    20
        else:

    21
            texts = "I have no answer for that"

    [backcolor=rgb(248, 248, 248) !important]
    22
        # to skip ascii character in case of error

    23
        texts = texts.encode('ascii', 'ignore')

    [backcolor=rgb(248, 248, 248) !important]
    24
        print texts

    25
    else:

    [backcolor=rgb(248, 248, 248) !important]
    26
        print "Sorry, I am not sure."



    You can test it like this shown in the screen shot below.

    Text To Speech

    From the processed query, we are returned with an answer in text format. What we need to do now is turning the text to audio speech. There are a few options available like Cepstral or Festival, but I chose Google’s speech service due to its excellent quality. Here is a good introductions of these software mentioned.

    First of all, to play audio we need to install mplayer:

    sudo apt-get install mplayer

    We have this simple bash script. It downloads the MP3 file via the URL and plays it. Copy and call it “text2speech.sh“:

    1
    #!/bin/bash

    [backcolor=rgb(248, 248, 248) !important]
    2
    say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols"[color=blue !important]http://translate.google.com/translate_tts?tl=en&q=$*"; }

    3
    say $*



    And to make it executable.

    chmod +x text2speech.sh

    To test it, you can try

    ./text2speech.sh "My name is Oscar and I am testing the audio."Google Text To Speech Text Length Limitation

    Although it’s very kind of Google sharing this great service, there is a limit on the length of the message. I think it’s around 100 characters.

    To work around this, here is an upgraded bash script that breaks up the text into multiple parts so each part is no longer than 100 characters, and each parts can be played successfully. I modified the original script is from here to fit into our application.

    1
    #!/bin/bash

    [backcolor=rgb(248, 248, 248) !important]
    2

    3
    INPUT=$*

    [backcolor=rgb(248, 248, 248) !important]
    4
    STRINGNUM=0

    5
    ary=($INPUT)

    [backcolor=rgb(248, 248, 248) !important]
    6
    for key in "${!ary[@]}"

    7
    do

    [backcolor=rgb(248, 248, 248) !important]
    8
    SHORTTMP[$STRINGNUM]="${SHORTTMP[$STRINGNUM]} ${ary[$key]}"

    9
    LENGTH=$(echo ${#SHORTTMP[$STRINGNUM]})

    [backcolor=rgb(248, 248, 248) !important]
    10

    11
    if [[ "$LENGTH" -lt "100" ]]; then

    [backcolor=rgb(248, 248, 248) !important]
    12

    13
    SHORT[$STRINGNUM]=${SHORTTMP[$STRINGNUM]}

    [backcolor=rgb(248, 248, 248) !important]
    14
    else

    15
    STRINGNUM=$(($STRINGNUM+1))

    [backcolor=rgb(248, 248, 248) !important]
    16
    SHORTTMP[$STRINGNUM]="${ary[$key]}"

    17
    SHORT[$STRINGNUM]="${ary[$key]}"

    [backcolor=rgb(248, 248, 248) !important]
    18
    fi

    19
    done

    [backcolor=rgb(248, 248, 248) !important]
    20
    for key in "${!SHORT[@]}"

    21
    do

    [backcolor=rgb(248, 248, 248) !important]
    22
    say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols"[color=blue !important]http://translate.google.com/translate_tts?tl=en&q=${SHORT[$key]}"; }

    23
    say $*

    [backcolor=rgb(248, 248, 248) !important]
    24
    done



    Putting It Together

    For all of these scripts to work together, we have to call them in a another script. I call this “main.sh“.

    1
    #!/bin/bash

    [backcolor=rgb(248, 248, 248) !important]
    2

    3
    echo "Recording... Press Ctrl+C to Stop."

    [backcolor=rgb(248, 248, 248) !important]
    4

    5
    ./speech2text.sh

    [backcolor=rgb(248, 248, 248) !important]
    6

    7
    QUESTION=$(cat stt.txt)

    [backcolor=rgb(248, 248, 248) !important]
    8
    echo "Me: ", $QUESTION

    9

    [backcolor=rgb(248, 248, 248) !important]
    10
    ANSWER=$(python queryprocess.py $QUESTION)

    11
    echo "Robot: ", $ANSWER

    [backcolor=rgb(248, 248, 248) !important]
    12

    13
    ./text2speech.sh $ANSWER



    I have also updated and removed all the ‘echo’ commands from “speech2text.sh

    1
    #!/bin/bash

    [backcolor=rgb(248, 248, 248) !important]
    2

    3
    arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

    [backcolor=rgb(248, 248, 248) !important]
    4
    wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "[color=blue !important]http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

    5
    rm file.flac  > /dev/null 2>&1



    Finally, make “main.sh” executable, run it and have silly conversation with your computer

    chmod +x text2speech.sh./main.shThe End

    That’s the end of Raspberry Pi Voice Recognition tutorial, but it’s just the beginning of fun! You can now modify this project and turn it into something really cool, let me know what you can come up with. In the next project, I will exploit the speech to text feature, to make a voice control system to control an Arduino board, and even better, a robot.

    Have fun.


    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则