Audios cannot be longer of 12 seconds #4

jordimas · 2021-03-05T18:41:25Z

It seems that the generated audios cannot be longer of 12 seconds. You can try for example the text "VilaWeb fou el primer mitjà digital català en incorporar una plataforma de blogs personals fàcilment gestionable pels mateixos usuaris, el 2004 oferí als lectors i col·laboradors la possibilitat de crear els seus propis blogs, que aconseguiren cert protagonisme i activitat els anys següents."

I see a warning: "Warning! Reached max decoder steps" I do not know if this is related

gullabi · 2021-03-07T11:05:04Z

This is a general problem with the architecture of neural TTS. The length of the synthesized audio is determined at the training phase. And since the model is trained with 12 seconds segments, it can only synthesize 12 seconds. The reason is the memory restrictions mostly during the training, since everything is calculated in the memory; although the limit can be increased with training the models with higher memory GPUs, it will be marginal and will never reach audiobook lengths.

There are currently better alternatives for the architecture, which are able to synthesize longer text/audio with better performance

FastSpeech
FastSpeech2
Tacotron DDC (in mozilla/TTS among multiple architectures, see also)

Having said that, even these architectures do not solve the problem of very long synthesis and can reach up to minutes of audio length. For a more thorough discussion of how to handle this architectural variety and evolution, see the future of the repo issue.

But for now the solution would be to use a text parser and synthesize the audio sequentially in chunks, as it is done in with the mycroft catotron plugin. And in fact, one positive outcome of this would be the possibility of parallelization which would address the other problem of latency.

jordimas changed the title ~~AAudios cannot be longer of 12 seconds~~ Audios cannot be longer of 12 seconds Mar 5, 2021

gullabi mentioned this issue Mar 7, 2021

Future of the repo - Roadmap #7

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audios cannot be longer of 12 seconds #4

Audios cannot be longer of 12 seconds #4

jordimas commented Mar 5, 2021 •

edited

Loading

gullabi commented Mar 7, 2021 •

edited

Loading

Audios cannot be longer of 12 seconds #4

Audios cannot be longer of 12 seconds #4

Comments

jordimas commented Mar 5, 2021 • edited Loading

gullabi commented Mar 7, 2021 • edited Loading

jordimas commented Mar 5, 2021 •

edited

Loading

gullabi commented Mar 7, 2021 •

edited

Loading