forked from vladmaraev/speechstate
-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0aa9f79
commit 77505bf
Showing
3 changed files
with
75 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,73 @@ | ||
#+OPTIONS: num:nil | ||
#+TITLE: Lab III. Speech synthesis | ||
|
||
In this lab session you will practice styling TTS output using Speech | ||
Synthesis Markup Language (SSML). It is assumed that you have read the | ||
relevant literature on the subject before attempting to solve the | ||
assignments. | ||
|
||
For reference: | ||
- Speech Synthesis Markup Language (SSML) Version 1.0, W3C | ||
Recommendation 7 September 2004, | ||
http://www.w3.org/TR/speech-synthesis/ | ||
- Documentation for implementation in each of the platforms | ||
|
||
Available TTS platforms: | ||
- Google TTS: [[https://cloud.google.com/text-to-speech/docs/ssml][Docs]], you can test it in the Google Actions Console | ||
- Amazon Polly: [[https://developer.amazon.com/en-GB/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html][Docs]], [[https://console.aws.amazon.com/polly/home/SynthesizeSpeech][Test]] (you need to have an account) | ||
- IBM Watson: [[https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-ssml&cm_mc_uid=40249035982415833944697&cm_mc_sid_50200000=96929811583394469719&cm_mc_sid_52640000=99688681583394469725][Docs]] | ||
- Azure Text-to-Speech: [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/index-text-to-speech][Docs]], [[https://speech.microsoft.com/portal/7e0abdfb01164e0eaf237b7bbfad409d/audiocontentcreation][Speech Studio]] | ||
|
||
* Part A: SSML warm-up exercise | ||
The objective of this assignment is to "style" dialogue system's utterances. Here is what are going to work with: | ||
#+BEGIN_SRC xml | ||
I have your calendar open. | ||
<break strength="medium"/> | ||
For what date? | ||
<break strength="medium"/> | ||
What time would you like to start? | ||
<break strength="medium"/> | ||
How much time do you want to block out? | ||
<break strength="medium"/> | ||
What shall we call this? | ||
<break strength="medium"/> | ||
Ok. [create and style the appointment summary!] | ||
#+END_SRC | ||
|
||
Your job is to create the last utterance and to make all the system's utterances articulate better by inserting SSML tags in the dialogue. | ||
|
||
* Part B: The sound of dialogue | ||
Modify the content produced by speech syntesis for you "Appointement" | ||
application from Lab II, so that it reads everything correctly. | ||
- Create a text document in your repository where you describe the | ||
words which are mispronounced by the system (especially when reading | ||
DuckDuckGo descriptions). | ||
- Improve their pronunciations with [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=csharp#use-custom-lexicon-to-improve-pronunciation][custom lexicon]] in Microsoft Speech | ||
Studio. | ||
- The lexicon should be linked by using a public link (!) because should be available for Azure TTS. In Speech studio you normally create text files which can be used to test custom lexicons. If you switch to SSML representation of text file you will find a public link to a lexicon file. In my case, it starts from https://cvoiceprodneu.blob.core.windows.net/acc-public-files/. | ||
|
||
* Part C: Speech Synthesis Poetry Slam | ||
#+BEGIN_QUOTE | ||
A poetry slam is a competition at which poets read or recite original work (or, more rarely, that of others). These performances are then judged on a numeric scale by previously selected members of the audience. (Wikipedia) | ||
#+END_QUOTE | ||
|
||
Your task in this assignment is to use SSML in order to get an artificial poet to recite the your favourite poem (just a couple of verses) with a speed and in "a style" similar to the way how it is read by an actor (or by a poet her/himself). | ||
|
||
You can refer to some poetry performance found on YouTube or | ||
elsewhere. | ||
|
||
Sources for inspiration: | ||
- [[https://www.youtube.com/watch?v=IZYoGj8D8pY][California Dreaming]] (386DX art project). | ||
- [[https://raw.githubusercontent.com/vladmaraev/rasa101/master/withoutme.m4a][Without Me]], which was made by Robert Rhys Thomas in 2019 for this course. | ||
- [[file:media/partC_badguy_voiced.mp3][Bad Guy]], which was made by Fang Yuan in 2020 for this course. | ||
|
||
* Submission | ||
In you submission mention which platform you used and provide: | ||
1) text files with your SSML code (for parts A, B and C) | ||
2) audio file for Part C | ||
3) reference for the performance for Part C | ||
|
||
These files can be placed in your Github repository. | ||
|
||
* Part D: Peer assignment 2 | ||
TBA |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters