Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "FSM" satellite type #258

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

emmaconnor
Copy link

Adds a new satellite type designed for advanced back-and-forth conversations. The satellite uses a finite state machine to track conversation state, and uses a combination of VAD and wakeword detection to detect state changes.

States:

  • Paused: initial state. No audio streaming, no wakeword detection, no vad. When a server connects, the satellite enters Monitor state.
  • Monitor: listens for wakeword detection events. When a wake word is detected, starts streaming audio to server.
  • Stream: streams mic input to server. When VAD detects the user is done speaking, enters playback state.
  • Playback: plays TTS response from server, awaits AudioStop event from server. Mic input is not processed in this state. This prevents TTS playback from activating VAD, but it has the downside that the user cannot interrupt playback. Can consider adding interruption functionality in the future. When the TTS playback ends, enters followup state.
  • Followup: Uses VAD to detect if the user starts speaking again. Wakeword is not required. If user starts speaking, enters stream state again. Otherwise, if no VAD after 10s, enters monitor state again.

State diagram is roughly:

+----------------+                                                                                                
|    Paused      |                                                                                                
+----------------+                                                                                                
       |                                                                                                
 (server connects)                                                                                                
       |                                                                                                
       V                                                                                                
+----------------+                                                                                                
|    Monitor     |<--------------------<-----------<                                                                                                
+----------------+                                 |                                                               
       |                                           |                                                     
(wakeword detected)                                |                                                                
       |                                           |                                                     
       V                                           |                                                     
+----------------+                                 ^                                                               
|    Stream      |<----------<----------<          |                                                               
+----------------+                      |          |                                                               
       |                                |          |                                                     
(VAD no longer detected)                |          |                                                                     
       |                                |          |                                                     
       V                                |          |                                                     
+----------------+                      ^          ^                                                               
|    Playback    |                      |          |                                                               
+----------------+                      |          |                                                               
       |                                |          |                                                     
(TTS playback finished)                 |          |                                                                    
       |                                |          |                                                     
       V                                |          ^
+----------------+                      |          |
|    Followup    |-->-(vad detected)----^          |
+----------------+                                 |
       |                                           |
(no vad detected for 10s)                          |
       |                                           |
       +---->---------(no vad detected for 10s)----^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant