Replies: 8 comments
-
train with |
Beta Was this translation helpful? Give feedback.
-
where is it? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
thanks, I get slightly better results I think, but the model still trains at lightning speed and reproduces the samples perfectly, can generate new speeches in SVC quite well but for singers its still a struggle to get good results when pitches change. I use this code repo version 3.11.0 My dataset has 132 samples from few seconds to 10 seconds. Here are the loss metrics I get: Can I assume my model is trained ? because except lf0 and mel, all other losses won't reduce much more. loss/g/total stagnates and loss/g/fm tend to even increase. How long does it take you to train a model ? How do you read the indicators to know your model is trained ? Thank you. |
Beta Was this translation helpful? Give feedback.
-
@Pimax1 Just a thought: when you were running the inference command, did you run it with output of
|
Beta Was this translation helpful? Give feedback.
-
thanks that helped a lot ! somehow some songs work much better with -na and others with -a. |
Beta Was this translation helpful? Give feedback.
-
@Pimax1 If you happen to find a good example wav where audio-predict works better than -na where its not just because the vocals aren't separated well, I've been tinkering with the pitch parts of the code and it might help if you don't mind sharing. |
Beta Was this translation helpful? Give feedback.
-
well my dataset is a female voice and if you try -na on a male singer, the output will be terribly bad but not putting -na will give much better results. how long do you guys train ? I don't see the point going further than 500 epoch. Also I have huge issues with consonants, all the "s" or "sh" are very robotic. Do you have this problem too ? or maybe its inherited from my dataset.. |
Beta Was this translation helpful? Give feedback.
-
so I thought training would take very long but it converges very quickly. At around 1000 steps the loss/g/mel
and loss/g/mel starts to stagnate and the validation audio sounds perfect.
It trains so fast and I am surprised to see plenty of models on hugging face with 40K steps or epoch.
Anyway, my main issue is that inferring on a singer gives me quite bad results anytime the singer changes pitch.
is there any updated guide somewhere I could follow to make a regular voice able to svc on most singers ?
I train like this :
svc pre-resample
svc pre-config -t so-vits-svc-4.0v1
svc pre-hubert -fm crepe
svc train -t
Then I tried my models at various epochs but the output is always the same : very bad at singing (sometimes is completely off, noisy, ineligible) but regular speech is ok.
Auto-tune just works better so I guess I must be doing something wrong. Am I supposed to train it much longer ? I am worried of overfitting and just get worst inferring results.
to infer I use F0 -> Crepe.
My dataset does not sing, maybe that's the problem ?
thank a lot for your help :)
Beta Was this translation helpful? Give feedback.
All reactions