Finetune Tortoise on General Speech to Improve Quality #672

fakerybakery · 2023-11-08T01:50:18Z

fakerybakery
Nov 8, 2023

Hi,
Are there any plans (incl. by the community) to finetune the model on general speech to improve quality, especially on short-form audio? Since the model is primarily trained on audiobooks, it doesn't do so well on short phrases such as "hi" or "hello."
Additionally, the general speech quality is better than any other solution but still isn't optimal. Is anyone planning to take this further by training the model on a general speech dataset and release it to the community, especially given the recent improvements with the fast API?
It doesn't seem too hard to do - just needs compute. Support for "prompting"/emotive speech would be harder.
If anyone has compute and is willing to release it to the community:

Clone the MRQ repo
Acquire a general multi-speaker voice dataset (maybe LibriSpeech/LibriVox + some shorter phrases + single words)
Finetune the model using the MRQ GUI
Release it to the community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune Tortoise on General Speech to Improve Quality #672

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Finetune Tortoise on General Speech to Improve Quality #672

fakerybakery Nov 8, 2023

Replies: 0 comments

fakerybakery
Nov 8, 2023