Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow GPU training Laptop 4600 8 Gb VRAM #703

Open
kivrus opened this issue Jan 18, 2025 · 1 comment
Open

Slow GPU training Laptop 4600 8 Gb VRAM #703

kivrus opened this issue Jan 18, 2025 · 1 comment

Comments

@kivrus
Copy link

kivrus commented Jan 18, 2025

@lpscr thanks a lot!
I managed to run it on my Legion Lenovo laptop with 4060 8 Gb VRAM.

Context: I'm very new for this whole thing.

But now I have issues with the training process. It is very slow for GPU... Biggest success was like 1 epoch in 10 minutes...
I will give more details

my prompts:
for preprocess:
I'm using ru language, yes
python3 -m piper_train.preprocess --language ru --input-dir ~/piper/my-dataset --output-dir ~/piper/my-training --dataset-format ljspeech --single-speaker --sample-rate 22050 --max-workers 1 --debug
I checked jsonl file in my-training, looked fine afaik. It had text and links to .wav

for train:

python3 -m piper_train
--dataset-dir ~/piper/my-training
--devices 1
--batch-size 32
--validation-split 0.0
--num-test-examples 0
--max_epochs 2200
--resume_from_checkpoint ~/piper/epoch=2164-step=1355540.ckpt
--accelerator 'gpu'
--checkpoint-epochs 1
--precision 16
GPU is 100% being used, confirmed it with many tools, it is always busy for like 7.3-7.8 VRAM
I've tried different datasets: 10K .wav files, 800 .wav files... They are really short, good studio quality and 6-7 words at max. Total duration of 10K is like 8-9 hours. Result is the same
I tested different batch sizes - all the way from 128 to 8. It rans out of memory with everything higher than 32. So I stopped with 32. In logs it said:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 14.07 GiB is allocated by PyTorch, and 493.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I think that I should be able to run training much smoother. But the only thing that still works - it is very slow both GPU and CPU....

P.S. When I followed the original guide, I was able to run training only on CPU, but I could see the progress: timers, epochs etc.
With this fix from @lpscr , I don't see anything. I've installed Tensorboard, but I see nothing on it. Charts are empty (I've used the last "version" of lightning logs file ofc....
that's the only prompt I had:
DEBUG:fsspec.local:open file: /home/kosov/piper/my-training/lightning_logs/version_39/hparams.yaml
D

@kivrus
Copy link
Author

kivrus commented Jan 18, 2025

lowered batch size:
--batch-size 16

add phoneme limit:
--max-phoneme-ids 300

now my speed is ~140 epochs/hour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant