Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix interruptions in ASR while resources are being respawned #12

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

vladmaraev
Copy link
Owner

@vladmaraev vladmaraev commented Dec 5, 2024

This bugfix removes respawning of ASR and TTS machines. Instead, the
parent machine (speechstate.ts) generates new tokens and
communicates them to ASR and TTS via NEW_TOKEN event.

ASR doesn’t need to be pre-initiated, so only the context gets updated
for a new token. New token is then used for a new ASR request.

TTS needs to be pre-initiated, therefore on a new token it generates
new instances of Web Speech API synthesiser and utterance. They are
then used for the next TTS request.

Appropriate tests are added which attempt to renew tokens while the
system is recognising or speaking. Additionally, a new setting,
newTokenInterval is added to configure how often new tokens will be
requested (mainly for testing purposes), defaulting to 30
seconds. Unfortunately, the tests are not exactly fair because the old
token doesn’t expire when the new one is released, but I tried to
make sure that the new token is actually used.

This PR also includes changes from #11 (playing sound files and TTS caching).

Additionally:
* cover with tests
* switch to playwright for testing
* fallback to speaking, if audio is not available (streaming URL will
be ignored)
This bugfix removes respawning of ASR and TTS machines. Instead, the
parent machine (`speechstate.ts`) generates new tokens and
communicates them to ASR and TTS via NEW_TOKEN event.

ASR doesn’t need to be pre-initiated, so only the context gets updated
for a new token. New token is then used for a new ASR request.

TTS *needs* to be pre-initiated, therefore on a new token it generates
new instances of Web Speech API synthesiser and utterance. They are
then used for the next TTS request.

Appropriate tests are added which attempt to renew tokens while the
system is recognising or speaking. Additionally, a new setting,
`newTokenInterval` is added to configure how often new tokens will be
requested (mainly for testing purposes), defaulting to 30
seconds. Unfortunately, the tests are not exactly fair because the old
token doesn’t expire when the new one is released, but I tried to
make sure that the new token is actually used.
Copy link
Collaborator

@fredrik-talkamatic fredrik-talkamatic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my limited understanding: Looks good to me.

Only thing is that I think that we could coordinate what the locales should look like. In the python world they look like "xx_XX", and in the MS and JS world, they seem to look like "xx-XX". TS is converting them from "-" to "_" which the TTS cache is converting back again to "-". Let's talk about that.

@vladmaraev vladmaraev merged commit e20e486 into master Dec 9, 2024
1 check passed
vladmaraev added a commit that referenced this pull request Dec 9, 2024
* Play audio

This change enables TTS to play audio. The audio is sent as part of SPEAK
event (`audioURL` parameter). If the resource is not available, it falls back
to normal synthesis (`utterance` parameter).

Additionally, for streamed content, the `cache` parameter containing a URL
can be provided with the SPEAK event. This URL will then be first called to
check whether cache exists and then, (true) make another request to get
an audio file to play or (false) fallback to normal TTS.

* Fix interruptions in ASR while resources are being respawned

This bugfix removes respawning of ASR and TTS machines. Instead, the
parent machine (`speechstate.ts`) generates new tokens and
communicates them to ASR and TTS via NEW_TOKEN event.

ASR doesn’t need to be pre-initiated, so only the context gets updated
for a new token. New token is then used for a new ASR request.

TTS *needs* to be pre-initiated, therefore on a new token it generates
new instances of Web Speech API synthesiser and utterance. They are
then used for the next TTS request.

Appropriate tests are added which attempt to renew tokens while the
system is recognising or speaking. Additionally, a new setting,
`newTokenInterval` is added to configure how often new tokens will be
requested (mainly for testing purposes), defaulting to 30
seconds. Unfortunately, the tests are not exactly fair because the old
token doesn’t expire when the new one is released, but I tried to
make sure that the new token is actually used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants