Lab III; fix tts lexicon variable

GU-CLASP · Feb 9, 2023 · 77505bf · 77505bf
1 parent 0aa9f79
commit 77505bf
Show file tree

Hide file tree

Showing 3 changed files with 75 additions and 4 deletions.
diff --git a/labs/lab3.org b/labs/lab3.org
@@ -1,2 +1,73 @@
 #+OPTIONS: num:nil
 #+TITLE: Lab III. Speech synthesis
+
+In this lab session you will practice styling TTS output using Speech
+Synthesis Markup Language (SSML). It is assumed that you have read the
+relevant literature on the subject before attempting to solve the
+assignments.
+
+For reference:
+- Speech Synthesis Markup Language (SSML) Version 1.0, W3C
+  Recommendation 7 September 2004,
+  http://www.w3.org/TR/speech-synthesis/
+- Documentation for implementation in each of the platforms
+
+Available TTS platforms:
+- Google TTS: [[https://cloud.google.com/text-to-speech/docs/ssml][Docs]], you can test it in the Google Actions Console
+- Amazon Polly: [[https://developer.amazon.com/en-GB/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html][Docs]], [[https://console.aws.amazon.com/polly/home/SynthesizeSpeech][Test]] (you need to have an account)
+- IBM Watson: [[https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-ssml&cm_mc_uid=40249035982415833944697&cm_mc_sid_50200000=96929811583394469719&cm_mc_sid_52640000=99688681583394469725][Docs]]
+- Azure Text-to-Speech: [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/index-text-to-speech][Docs]], [[https://speech.microsoft.com/portal/7e0abdfb01164e0eaf237b7bbfad409d/audiocontentcreation][Speech Studio]]
+
+* Part A: SSML warm-up exercise
+The objective of this assignment is to "style" dialogue system's utterances. Here is what are going to work with:
+#+BEGIN_SRC xml
+I have your calendar open. 
+<break strength="medium"/> 
+For what date? 
+<break strength="medium"/> 
+What time would you like to start? 
+<break strength="medium"/> 
+How much time do you want to block out? 
+<break strength="medium"/> 
+What shall we call this? 
+<break strength="medium"/> 
+Ok. [create and style the appointment summary!]
+#+END_SRC
+
+Your job is to create the last utterance and to make all the system's utterances articulate better by inserting SSML tags in the dialogue.
+
+* Part B: The sound of dialogue
+Modify the content produced by speech syntesis for you "Appointement"
+application from Lab II, so that it reads everything correctly.
+- Create a text document in your repository where you describe the
+  words which are mispronounced by the system (especially when reading
+  DuckDuckGo descriptions).
+- Improve their pronunciations with [[https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=csharp#use-custom-lexicon-to-improve-pronunciation][custom lexicon]] in Microsoft Speech
+  Studio.
+- The lexicon should be linked by using a public link (!) because should be available for Azure TTS. In Speech studio you normally create text files which can be used to test custom lexicons. If you switch to SSML representation of text file you will find a public link to a lexicon file. In my case, it starts from https://cvoiceprodneu.blob.core.windows.net/acc-public-files/.
+
+* Part C: Speech Synthesis Poetry Slam
+#+BEGIN_QUOTE
+A poetry slam is a competition at which poets read or recite original work (or, more rarely, that of others). These performances are then judged on a numeric scale by previously selected members of the audience. (Wikipedia)
+#+END_QUOTE
+
+Your task in this assignment is to use SSML in order to get an artificial poet to recite the your favourite poem (just a couple of verses) with a speed and in "a style" similar to the way how it is read by an actor (or by a poet her/himself). 
+
+You can refer to some poetry performance found on YouTube or
+elsewhere.
+
+Sources for inspiration:
+- [[https://www.youtube.com/watch?v=IZYoGj8D8pY][California Dreaming]] (386DX art project).
+- [[https://raw.githubusercontent.com/vladmaraev/rasa101/master/withoutme.m4a][Without Me]], which was made by Robert Rhys Thomas in 2019 for this course.
+- [[file:media/partC_badguy_voiced.mp3][Bad Guy]], which was made by Fang Yuan in 2020 for this course.
+
+* Submission
+In you submission mention which platform you used and provide:
+1) text files with your SSML code (for parts A, B and C)
+2) audio file for Part C
+3) reference for the performance for Part C
+
+These files can be placed in your Github repository.
+
+* Part D: Peer assignment 2
+ TBA
diff --git a/labs/media/partC_badguy_voiced.mp3 b/labs/media/partC_badguy_voiced.mp3
diff --git a/src/index.tsx b/src/index.tsx
@@ -4,7 +4,7 @@ import * as ReactDOM from "react-dom";
 import { createMachine, assign, actions, State } from "xstate";
 import { useMachine } from "@xstate/react";
 import { inspect } from "@xstate/inspect";
-import { dmMachine } from "./dmColourChanger";
+import { dmMachine } from "./dmJokes";
 
 import createSpeechRecognitionPonyfill from "web-speech-cognitive-services/lib/SpeechServices/SpeechToText";
 import createSpeechSynthesisPonyfill from "web-speech-cognitive-services/lib/SpeechServices/TextToSpeech";
@@ -285,8 +285,8 @@ function App({ domElement }: any) {
         let content = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"><voice name="${context.voice.name}">`;
         content =
           content +
-          (process.env.REACT_APP_TTS_LEXICON
-            ? `<lexicon uri="${process.env.REACT_APP_TTS_LEXICON}"/>`
+          (context.parameters.ttsLexicon
+            ? `<lexicon uri="${context.parameters.ttsLexicon}"/>`
             : "");
         content = content + `${context.ttsAgenda}</voice></speak>`;
         const utterance = new context.ttsUtterance(content);
@@ -308,7 +308,7 @@ function App({ domElement }: any) {
           },
         });
         context.asr = new SpeechRecognition();
-        context.asr.lang = process.env.REACT_APP_ASR_LANGUAGE || "en-US";
+        context.asr.lang = context.parameters.asrLanguage || "en-US";
         context.asr.continuous = true;
         context.asr.interimResults = true;
         context.asr.onresult = function (event: any) {