Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrain the models with more data (from festcat) #8

Open
gullabi opened this issue Apr 1, 2021 · 1 comment
Open

Retrain the models with more data (from festcat) #8

gullabi opened this issue Apr 1, 2021 · 1 comment

Comments

@gullabi
Copy link

gullabi commented Apr 1, 2021

The training data of Catotron comes from festcat, but all data has been used. Simply the very long segments in the festcat data have been omitted. This might be causing the following problems:

  • The failure of the attention from time to time, meaning non synthesized or only partly synthesized segments.
  • The lack of prosodic difference between questions and normal sentences

With some smart parsing approx 4 hours per speaker should be able to be augmented (up to 10 hours per speaker).

This task was already mentioned in the large roadmap issue, and I open this specific issue to follow the developments.

@gullabi
Copy link
Author

gullabi commented Dec 13, 2021

A first batch of training is finished, using Coqui TTS.

output_test_pau_vocoder.mp4

However, I couldn't get Ona to use the vocoder correctly, hence here is a segment generate with GL

output_test_ona_gl.mp4

(more updates incoming via edit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant