Fix interruptions in ASR while resources are being respawned #12

vladmaraev · 2024-12-05T18:23:23Z

This bugfix removes respawning of ASR and TTS machines. Instead, the
parent machine (speechstate.ts) generates new tokens and
communicates them to ASR and TTS via NEW_TOKEN event.

ASR doesn’t need to be pre-initiated, so only the context gets updated
for a new token. New token is then used for a new ASR request.

TTS needs to be pre-initiated, therefore on a new token it generates
new instances of Web Speech API synthesiser and utterance. They are
then used for the next TTS request.

Appropriate tests are added which attempt to renew tokens while the
system is recognising or speaking. Additionally, a new setting,
newTokenInterval is added to configure how often new tokens will be
requested (mainly for testing purposes), defaulting to 30
seconds. Unfortunately, the tests are not exactly fair because the old
token doesn’t expire when the new one is released, but I tried to
make sure that the new token is actually used.

This PR also includes changes from #11 (playing sound files and TTS caching).

Additionally: * cover with tests * switch to playwright for testing * fallback to speaking, if audio is not available (streaming URL will be ignored)

This bugfix removes respawning of ASR and TTS machines. Instead, the parent machine (`speechstate.ts`) generates new tokens and communicates them to ASR and TTS via NEW_TOKEN event. ASR doesn’t need to be pre-initiated, so only the context gets updated for a new token. New token is then used for a new ASR request. TTS *needs* to be pre-initiated, therefore on a new token it generates new instances of Web Speech API synthesiser and utterance. They are then used for the next TTS request. Appropriate tests are added which attempt to renew tokens while the system is recognising or speaking. Additionally, a new setting, `newTokenInterval` is added to configure how often new tokens will be requested (mainly for testing purposes), defaulting to 30 seconds. Unfortunately, the tests are not exactly fair because the old token doesn’t expire when the new one is released, but I tried to make sure that the new token is actually used.

fredrik-talkamatic

With my limited understanding: Looks good to me.

Only thing is that I think that we could coordinate what the locales should look like. In the python world they look like "xx_XX", and in the MS and JS world, they seem to look like "xx-XX". TS is converting them from "-" to "_" which the TTS cache is converting back again to "-". Let's talk about that.

* Play audio This change enables TTS to play audio. The audio is sent as part of SPEAK event (`audioURL` parameter). If the resource is not available, it falls back to normal synthesis (`utterance` parameter). Additionally, for streamed content, the `cache` parameter containing a URL can be provided with the SPEAK event. This URL will then be first called to check whether cache exists and then, (true) make another request to get an audio file to play or (false) fallback to normal TTS. * Fix interruptions in ASR while resources are being respawned This bugfix removes respawning of ASR and TTS machines. Instead, the parent machine (`speechstate.ts`) generates new tokens and communicates them to ASR and TTS via NEW_TOKEN event. ASR doesn’t need to be pre-initiated, so only the context gets updated for a new token. New token is then used for a new ASR request. TTS *needs* to be pre-initiated, therefore on a new token it generates new instances of Web Speech API synthesiser and utterance. They are then used for the next TTS request. Appropriate tests are added which attempt to renew tokens while the system is recognising or speaking. Additionally, a new setting, `newTokenInterval` is added to configure how often new tokens will be requested (mainly for testing purposes), defaulting to 30 seconds. Unfortunately, the tests are not exactly fair because the old token doesn’t expire when the new one is released, but I tried to make sure that the new token is actually used.

vladmaraev added 5 commits October 28, 2024 12:03

WIP play tts audio

3d61aee

Implement CONTROL (stop and restart)

7781a4c

Additionally: * cover with tests * switch to playwright for testing * fallback to speaking, if audio is not available (streaming URL will be ignored)

Implement TTS from cache in streaming

b0657a9

fix locale for checking cache

df43e6c

vladmaraev force-pushed the bugfix/no-more-respawning branch from 9bb77f3 to 8f63a2f Compare December 5, 2024 18:33

vladmaraev requested a review from fredrik-talkamatic December 9, 2024 09:32

fredrik-talkamatic approved these changes Dec 9, 2024

View reviewed changes

vladmaraev merged commit e20e486 into master Dec 9, 2024
1 check passed

vladmaraev mentioned this pull request Dec 9, 2024

Play audio provided with TTS request #11

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix interruptions in ASR while resources are being respawned #12

Fix interruptions in ASR while resources are being respawned #12

vladmaraev commented Dec 5, 2024 •

edited

Loading

fredrik-talkamatic left a comment

Fix interruptions in ASR while resources are being respawned #12

Fix interruptions in ASR while resources are being respawned #12

Conversation

vladmaraev commented Dec 5, 2024 • edited Loading

fredrik-talkamatic left a comment

Choose a reason for hiding this comment

vladmaraev commented Dec 5, 2024 •

edited

Loading