Skip to content

Latest commit

 

History

History
74 lines (54 loc) · 3.82 KB

README.md

File metadata and controls

74 lines (54 loc) · 3.82 KB

Voicebot

This is an end-to-end voicebot that aims to answer open domain questions, and is intended to be used as a benchmarking tool

Design

Requirements and Setup

  • python 3.6
  • pytorch (1.1.0)
  • tensorflow (1.12)
  • wikipedia (1.14)
  • deepspeech (0.5.0)
  • spacy (2.1.5)
  • gingerit (0.8.0)
  • pytorch-pretrained-bert (0.6.2)
  • playsound (1.2.2)
  • sounddevice (0.3.13)
  • soundfile (0.10.2)
  • inflect (2.1)
  • librosa (0.7.0)
  • matplotlib (3.1.1)
  • unidecode (1.1.1)
  • numpy (1.17.0)

We recommend using a virtual environment to run this to prevent any conflicts with things like numpy.

You can install any of the Spacy NER models you prefer (We used 'en_core_web_md') by:

  • python -m spacy download en_core_web_md (Note: Run this in an elevated command prompt with Admin permissions)

You will also require the following models

An info.txt file is located in every directory where a specific model is required. Extract the contents of the models and place them in their respective folders in the project. (BERT, DeepSpeech/Models and Tacotron_TTS/tacotron-models-data folders respectively. WaveRNN should be extracted under the Vocoder_WaveRNN folder)

Open domain QA will also require an internet connection, to get information from Wikipedia.

Running the program

Run the Voicebot file to start the application. You will be prompted to select the TTS system of your choice after the other models have loaded.

The WaveRNN + Tacotron is very resource heavy and produces poor results when run on systems with 8GB of RAM. The speech produced is a lot more natural sounding but often have garbage audio produced towards the end. The standalone tacotron is much lighter, and will not have as poor results on systems with lower resources

Once the TTS has been loaded you will be prompted to select the running mode. This will let you choose between a microphone for input audio, or allow you to use a folder of audio files to test. To add your own audio to the testing set, simply place the wav file in the test-audio folder. For best results, use an American male voice, with a normal or slow speed setting from a site like this.

Running on Windows 10

Run the VoiceBot-windows.py file. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders

Running on Ubuntu

Run the VoiceBot-linux.py file.

Note : The playsound library and sounddevice library are not compatible with Ubuntu, so audio cannot be recorded from or played on the console. VoiceBot can work only from questions pre-recorded in 'test_audio' folder. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders

References

Demo video

Link to demo video here: https://drive.google.com/file/d/16pFeDjqDOCkVXW0cc09l_mkuxqgQjo8s/view?usp=drive_web