Skip to content

Commit

Permalink
Lab III; fix tts lexicon variable
Browse files Browse the repository at this point in the history
  • Loading branch information
vladmaraev committed Feb 9, 2023
1 parent 0aa9f79 commit 77505bf
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 4 deletions.
71 changes: 71 additions & 0 deletions labs/lab3.org
Original file line number Diff line number Diff line change
@@ -1,2 +1,73 @@
#+OPTIONS: num:nil
#+TITLE: Lab III. Speech synthesis

In this lab session you will practice styling TTS output using Speech
Synthesis Markup Language (SSML). It is assumed that you have read the
relevant literature on the subject before attempting to solve the
assignments.

For reference:
- Speech Synthesis Markup Language (SSML) Version 1.0, W3C
Recommendation 7 September 2004,
http://www.w3.org/TR/speech-synthesis/
- Documentation for implementation in each of the platforms

Available TTS platforms:
- Google TTS: [[https://cloud.google.com/text-to-speech/docs/ssml][Docs]], you can test it in the Google Actions Console
- Amazon Polly: [[https://developer.amazon.com/en-GB/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html][Docs]], [[https://console.aws.amazon.com/polly/home/SynthesizeSpeech][Test]] (you need to have an account)
- IBM Watson: [[https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-ssml&cm_mc_uid=40249035982415833944697&cm_mc_sid_50200000=96929811583394469719&cm_mc_sid_52640000=99688681583394469725][Docs]]
- Azure Text-to-Speech: [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/index-text-to-speech][Docs]], [[https://speech.microsoft.com/portal/7e0abdfb01164e0eaf237b7bbfad409d/audiocontentcreation][Speech Studio]]

* Part A: SSML warm-up exercise
The objective of this assignment is to "style" dialogue system's utterances. Here is what are going to work with:
#+BEGIN_SRC xml
I have your calendar open.
<break strength="medium"/>
For what date?
<break strength="medium"/>
What time would you like to start?
<break strength="medium"/>
How much time do you want to block out?
<break strength="medium"/>
What shall we call this?
<break strength="medium"/>
Ok. [create and style the appointment summary!]
#+END_SRC

Your job is to create the last utterance and to make all the system's utterances articulate better by inserting SSML tags in the dialogue.

* Part B: The sound of dialogue
Modify the content produced by speech syntesis for you "Appointement"
application from Lab II, so that it reads everything correctly.
- Create a text document in your repository where you describe the
words which are mispronounced by the system (especially when reading
DuckDuckGo descriptions).
- Improve their pronunciations with [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=csharp#use-custom-lexicon-to-improve-pronunciation][custom lexicon]] in Microsoft Speech
Studio.
- The lexicon should be linked by using a public link (!) because should be available for Azure TTS. In Speech studio you normally create text files which can be used to test custom lexicons. If you switch to SSML representation of text file you will find a public link to a lexicon file. In my case, it starts from https://cvoiceprodneu.blob.core.windows.net/acc-public-files/.

* Part C: Speech Synthesis Poetry Slam
#+BEGIN_QUOTE
A poetry slam is a competition at which poets read or recite original work (or, more rarely, that of others). These performances are then judged on a numeric scale by previously selected members of the audience. (Wikipedia)
#+END_QUOTE

Your task in this assignment is to use SSML in order to get an artificial poet to recite the your favourite poem (just a couple of verses) with a speed and in "a style" similar to the way how it is read by an actor (or by a poet her/himself).

You can refer to some poetry performance found on YouTube or
elsewhere.

Sources for inspiration:
- [[https://www.youtube.com/watch?v=IZYoGj8D8pY][California Dreaming]] (386DX art project).
- [[https://raw.githubusercontent.com/vladmaraev/rasa101/master/withoutme.m4a][Without Me]], which was made by Robert Rhys Thomas in 2019 for this course.
- [[file:media/partC_badguy_voiced.mp3][Bad Guy]], which was made by Fang Yuan in 2020 for this course.

* Submission
In you submission mention which platform you used and provide:
1) text files with your SSML code (for parts A, B and C)
2) audio file for Part C
3) reference for the performance for Part C

These files can be placed in your Github repository.

* Part D: Peer assignment 2
TBA
Binary file added labs/media/partC_badguy_voiced.mp3
Binary file not shown.
8 changes: 4 additions & 4 deletions src/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import * as ReactDOM from "react-dom";
import { createMachine, assign, actions, State } from "xstate";
import { useMachine } from "@xstate/react";
import { inspect } from "@xstate/inspect";
import { dmMachine } from "./dmColourChanger";
import { dmMachine } from "./dmJokes";

import createSpeechRecognitionPonyfill from "web-speech-cognitive-services/lib/SpeechServices/SpeechToText";
import createSpeechSynthesisPonyfill from "web-speech-cognitive-services/lib/SpeechServices/TextToSpeech";
Expand Down Expand Up @@ -285,8 +285,8 @@ function App({ domElement }: any) {
let content = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"><voice name="${context.voice.name}">`;
content =
content +
(process.env.REACT_APP_TTS_LEXICON
? `<lexicon uri="${process.env.REACT_APP_TTS_LEXICON}"/>`
(context.parameters.ttsLexicon
? `<lexicon uri="${context.parameters.ttsLexicon}"/>`
: "");
content = content + `${context.ttsAgenda}</voice></speak>`;
const utterance = new context.ttsUtterance(content);
Expand All @@ -308,7 +308,7 @@ function App({ domElement }: any) {
},
});
context.asr = new SpeechRecognition();
context.asr.lang = process.env.REACT_APP_ASR_LANGUAGE || "en-US";
context.asr.lang = context.parameters.asrLanguage || "en-US";
context.asr.continuous = true;
context.asr.interimResults = true;
context.asr.onresult = function (event: any) {
Expand Down

0 comments on commit 77505bf

Please sign in to comment.