You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training data of Catotron comes from festcat, but all data has been used. Simply the very long segments in the festcat data have been omitted. This might be causing the following problems:
The failure of the attention from time to time, meaning non synthesized or only partly synthesized segments.
The lack of prosodic difference between questions and normal sentences
With some smart parsing approx 4 hours per speaker should be able to be augmented (up to 10 hours per speaker).
This task was already mentioned in the large roadmap issue, and I open this specific issue to follow the developments.
The text was updated successfully, but these errors were encountered:
The training data of Catotron comes from festcat, but all data has been used. Simply the very long segments in the festcat data have been omitted. This might be causing the following problems:
With some smart parsing approx 4 hours per speaker should be able to be augmented (up to 10 hours per speaker).
This task was already mentioned in the large roadmap issue, and I open this specific issue to follow the developments.
The text was updated successfully, but these errors were encountered: