-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix interruptions in ASR while resources are being respawned #12
Conversation
Additionally: * cover with tests * switch to playwright for testing * fallback to speaking, if audio is not available (streaming URL will be ignored)
This bugfix removes respawning of ASR and TTS machines. Instead, the parent machine (`speechstate.ts`) generates new tokens and communicates them to ASR and TTS via NEW_TOKEN event. ASR doesn’t need to be pre-initiated, so only the context gets updated for a new token. New token is then used for a new ASR request. TTS *needs* to be pre-initiated, therefore on a new token it generates new instances of Web Speech API synthesiser and utterance. They are then used for the next TTS request. Appropriate tests are added which attempt to renew tokens while the system is recognising or speaking. Additionally, a new setting, `newTokenInterval` is added to configure how often new tokens will be requested (mainly for testing purposes), defaulting to 30 seconds. Unfortunately, the tests are not exactly fair because the old token doesn’t expire when the new one is released, but I tried to make sure that the new token is actually used.
9bb77f3
to
8f63a2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my limited understanding: Looks good to me.
Only thing is that I think that we could coordinate what the locales should look like. In the python world they look like "xx_XX", and in the MS and JS world, they seem to look like "xx-XX". TS is converting them from "-" to "_" which the TTS cache is converting back again to "-". Let's talk about that.
* Play audio This change enables TTS to play audio. The audio is sent as part of SPEAK event (`audioURL` parameter). If the resource is not available, it falls back to normal synthesis (`utterance` parameter). Additionally, for streamed content, the `cache` parameter containing a URL can be provided with the SPEAK event. This URL will then be first called to check whether cache exists and then, (true) make another request to get an audio file to play or (false) fallback to normal TTS. * Fix interruptions in ASR while resources are being respawned This bugfix removes respawning of ASR and TTS machines. Instead, the parent machine (`speechstate.ts`) generates new tokens and communicates them to ASR and TTS via NEW_TOKEN event. ASR doesn’t need to be pre-initiated, so only the context gets updated for a new token. New token is then used for a new ASR request. TTS *needs* to be pre-initiated, therefore on a new token it generates new instances of Web Speech API synthesiser and utterance. They are then used for the next TTS request. Appropriate tests are added which attempt to renew tokens while the system is recognising or speaking. Additionally, a new setting, `newTokenInterval` is added to configure how often new tokens will be requested (mainly for testing purposes), defaulting to 30 seconds. Unfortunately, the tests are not exactly fair because the old token doesn’t expire when the new one is released, but I tried to make sure that the new token is actually used.
This bugfix removes respawning of ASR and TTS machines. Instead, the
parent machine (
speechstate.ts
) generates new tokens andcommunicates them to ASR and TTS via NEW_TOKEN event.
ASR doesn’t need to be pre-initiated, so only the context gets updated
for a new token. New token is then used for a new ASR request.
TTS needs to be pre-initiated, therefore on a new token it generates
new instances of Web Speech API synthesiser and utterance. They are
then used for the next TTS request.
Appropriate tests are added which attempt to renew tokens while the
system is recognising or speaking. Additionally, a new setting,
newTokenInterval
is added to configure how often new tokens will berequested (mainly for testing purposes), defaulting to 30
seconds. Unfortunately, the tests are not exactly fair because the old
token doesn’t expire when the new one is released, but I tried to
make sure that the new token is actually used.
This PR also includes changes from #11 (playing sound files and TTS caching).