Finetune Tortoise on General Speech to Improve Quality #672
fakerybakery
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Are there any plans (incl. by the community) to finetune the model on general speech to improve quality, especially on short-form audio? Since the model is primarily trained on audiobooks, it doesn't do so well on short phrases such as "hi" or "hello."
Additionally, the general speech quality is better than any other solution but still isn't optimal. Is anyone planning to take this further by training the model on a general speech dataset and release it to the community, especially given the recent improvements with the fast API?
It doesn't seem too hard to do - just needs compute. Support for "prompting"/emotive speech would be harder.
If anyone has compute and is willing to release it to the community:
Beta Was this translation helpful? Give feedback.
All reactions