2.3 Tutorials on using the Speech Blocks

Speech-To-Text

We can program CAIT to listen for speech and convert it into text. This is called "Speech-to-Text" in the A.I. terms. It is commonly used in voice assistant and robots. If you have ever used Google Home, Amazon Echo, or Apple Siri, then you have already used "Speech-to-Text" in your daily life. CAIT provides two types of Speech-To-Text capabilities: 1. A deep learning model called DeepSpeech runs locally to transcribe your speech; 2. A bridge to use online services such as Google Cloud to transcribe your speech.

The program below illustrates speech to text in CAIT. Note that we initialize the Speech module with "online" mode using a Google cloud account in English. You can choose to initialize it with the "on device" mode if you prefer not to use any online services. However, only English is supported for "on device" mode. Because the raspberry pi's computational resources are very limited, the "on device" mode's accuracy will not be as high as the "online" mode.

After you click the Run button, wait for an audio prompt (a "ding-ding" sound), then you can start speaking into the microphone. When you are done speaking, CAIT will automatically detect the silence and convert your speech to text. You should see a text box appear showing what you just said. Below is a screenshot of the running program.

Text-To-Speech

CAIT can also perform "text to speech." This is the reverse of "speech to text" in which it converts text into synthesized audio output. This type of A.I. algorithm is also commonly used in voice assistant and robot.

Below is a program that shows how to do this in CAIT. If you initialize the speech module in online mode, it will use the Google Cloud service to generate the voice samples. If you prefer local voice generation, make sure to initialize the module with "on device" mode.

After you run the program, you should hear a female voice saying what you entered in the "say" block.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.3 Tutorials on using the Speech Blocks

Speech-To-Text

Text-To-Speech

Clone this wiki locally