its so fast to train but doesnt sing well ... #562

Pimax1 · 2023-04-28T22:33:50Z

Pimax1
Apr 28, 2023

so I thought training would take very long but it converges very quickly. At around 1000 steps the loss/g/mel
and loss/g/mel starts to stagnate and the validation audio sounds perfect.
It trains so fast and I am surprised to see plenty of models on hugging face with 40K steps or epoch.

Anyway, my main issue is that inferring on a singer gives me quite bad results anytime the singer changes pitch.

is there any updated guide somewhere I could follow to make a regular voice able to svc on most singers ?

I train like this :
svc pre-resample
svc pre-config -t so-vits-svc-4.0v1
svc pre-hubert -fm crepe
svc train -t

Then I tried my models at various epochs but the output is always the same : very bad at singing (sometimes is completely off, noisy, ineligible) but regular speech is ok.
Auto-tune just works better so I guess I must be doing something wrong. Am I supposed to train it much longer ? I am worried of overfitting and just get worst inferring results.

to infer I use F0 -> Crepe.

My dataset does not sing, maybe that's the problem ?

thank a lot for your help :)

ThrowawayAccount01 · 2023-04-29T03:26:29Z

ThrowawayAccount01
Apr 29, 2023

train with so-vits-svc-4.0v1-legacy, not the regular version

0 replies

zpng · 2023-04-29T04:08:12Z

zpng
Apr 29, 2023

where is it?

0 replies

zpng · 2023-04-29T04:08:34Z

zpng
Apr 29, 2023

train with so-vits-svc-4.0v1-legacy, not the regular version
where is it? how to train with it?

0 replies

Pimax1 · 2023-04-30T12:05:12Z

Pimax1
Apr 30, 2023
Author

thanks, I get slightly better results I think, but the model still trains at lightning speed and reproduces the samples perfectly, can generate new speeches in SVC quite well but for singers its still a struggle to get good results when pitches change.

I use this code repo version 3.11.0
I train like this :
svc pre-resample
svc pre-config -t so-vits-svc-4.0v1-legacy
svc pre-hubert -fm crepe
svc train -t

My dataset has 132 samples from few seconds to 10 seconds.

Here are the loss metrics I get:

Can I assume my model is trained ? because except lf0 and mel, all other losses won't reduce much more.

loss/g/total stagnates and loss/g/fm tend to even increase. How long does it take you to train a model ? How do you read the indicators to know your model is trained ? Thank you.

0 replies

RuolinZheng08 · 2023-04-30T23:02:35Z

RuolinZheng08
Apr 30, 2023

@Pimax1 Just a thought: when you were running the inference command, did you run it with -na (for --no-auto-predict-f0)? I'm running into the same problem and using -na helped a bit.

output of svc infer -h below:

Usage: svc infer [OPTIONS] INPUT_PATH

  Inference

Options:
  -o, --output-path PATH          path to output dir
  -s, --speaker TEXT              speaker name
  -m, --model-path PATH           path to model  [default: logs/44k]
  -c, --config-path PATH          path to config  [default: configs/44k/config.json]
  -k, --cluster-model-path PATH   path to cluster model
  -t, --transpose INTEGER         transpose  [default: 0]
  -db, --db-thresh INTEGER        threshold (DB) (RELATIVE)  [default: -20]
  -fm, --f0-method [crepe|crepe-tiny|parselmouth|dio|harvest]
                                  f0 prediction method  [default: dio]
  -a, --auto-predict-f0 / -na, --no-auto-predict-f0
                                  auto predict f0  [default: a]
  -r, --cluster-infer-ratio FLOAT
                                  cluster infer ratio  [default: 0]
  -n, --noise-scale FLOAT         noise scale  [default: 0.4]
  -p, --pad-seconds FLOAT         pad seconds  [default: 0.5]
  -d, --device TEXT               device  [default: cuda:0]
  -ch, --chunk-seconds FLOAT      chunk seconds  [default: 0.5]
  -ab, --absolute-thresh / -nab, --no-absolute-thresh
                                  absolute thresh  [default: nab]
  -h, --help                      Show this message and exit.

0 replies

Pimax1 · 2023-05-01T13:48:37Z

Pimax1
May 1, 2023
Author

thanks that helped a lot ! somehow some songs work much better with -na and others with -a.
But I can work with that !

0 replies

GarrettConway · 2023-05-01T15:38:25Z

GarrettConway
May 1, 2023
Maintainer

@Pimax1 If you happen to find a good example wav where audio-predict works better than -na where its not just because the vocals aren't separated well, I've been tinkering with the pitch parts of the code and it might help if you don't mind sharing.

0 replies

Pimax1 · 2023-05-02T17:07:20Z

Pimax1
May 2, 2023
Author

well my dataset is a female voice and if you try -na on a male singer, the output will be terribly bad but not putting -na will give much better results.

how long do you guys train ? I don't see the point going further than 500 epoch.

Also I have huge issues with consonants, all the "s" or "sh" are very robotic. Do you have this problem too ? or maybe its inherited from my dataset..

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

its so fast to train but doesnt sing well ... #562

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

its so fast to train but doesnt sing well ... #562

Pimax1 Apr 28, 2023

Replies: 8 comments

ThrowawayAccount01 Apr 29, 2023

zpng Apr 29, 2023

zpng Apr 29, 2023

Pimax1 Apr 30, 2023 Author

RuolinZheng08 Apr 30, 2023

Pimax1 May 1, 2023 Author

GarrettConway May 1, 2023 Maintainer

Pimax1 May 2, 2023 Author

Pimax1
Apr 28, 2023

ThrowawayAccount01
Apr 29, 2023

zpng
Apr 29, 2023

zpng
Apr 29, 2023

Pimax1
Apr 30, 2023
Author

RuolinZheng08
Apr 30, 2023

Pimax1
May 1, 2023
Author

GarrettConway
May 1, 2023
Maintainer

Pimax1
May 2, 2023
Author