Retrain the models with more data (from festcat) #8

gullabi · 2021-04-01T16:23:17Z

The training data of Catotron comes from festcat, but all data has been used. Simply the very long segments in the festcat data have been omitted. This might be causing the following problems:

The failure of the attention from time to time, meaning non synthesized or only partly synthesized segments.
The lack of prosodic difference between questions and normal sentences

With some smart parsing approx 4 hours per speaker should be able to be augmented (up to 10 hours per speaker).

This task was already mentioned in the large roadmap issue, and I open this specific issue to follow the developments.

gullabi · 2021-12-13T11:43:07Z

A first batch of training is finished, using Coqui TTS.

output_test_pau_vocoder.mp4

However, I couldn't get Ona to use the vocoder correctly, hence here is a segment generate with GL

output_test_ona_gl.mp4

(more updates incoming via edit)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrain the models with more data (from festcat) #8

Retrain the models with more data (from festcat) #8

gullabi commented Apr 1, 2021 •

edited

Loading

gullabi commented Dec 13, 2021

Retrain the models with more data (from festcat) #8

Retrain the models with more data (from festcat) #8

Comments

gullabi commented Apr 1, 2021 • edited Loading

gullabi commented Dec 13, 2021

gullabi commented Apr 1, 2021 •

edited

Loading