-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-gpu SDXL training #1000
Conversation
are you sure last layer of text_encoder1 is not trained? because i don't want single gpu training to be broken |
SDXL uses the output of the penultimate layer of Text Encoder 1 instead of the last layer. As a result, it won't participate in the loss calculation, but raise a RuntimeError while backward the gradients for DDP because it received no grad. The grad of text_encoder1's last layer should be none neither in single GPU nor multi-gpus when I reproduced the RuntimeError. Can you check the grad of text_encoder1's last layer or the grad between different device? It seems that the grads between GPUs are not synced correctly. |
how can I check latest layer difference? i have trained model right now that i can compare |
You can just add a print after print([k for k,v in text_encoder1.named_parameters() if v.grad is None]) It should outputs in neither single gpu nor multi-gpus:
Or you can print to compare between the device (this will print weight/grad in different values if you are using the main branch): print(accelerator.device, THE WEIGHT/GRAD YOU WANT TO COMPARE) |
Thank you so much! I hope this will finally solve the DDP training issue. I've changed to set |
Thank you very much! I'm still running the SD XL training script, but the output images so far are very promising. Great improvements in details and texture. |
multi gpu broken kaggle training anyone has any guess how to fix? |
Fix : dev SDXL:multi-GPUs train #994
Fix: "Parameter indices which did not receive grad for rank x", Multi-GPU SDXL Training (unet + both text encoders) #997
Add
--gradient_as_bucket_view
to reduce VRAM usage in DDP trainingAdd
--static_graph
to solve conflict between DDP and gradients checkpointsFreeze the last layer of text_encoder1 in
sdxl_train.py
since it doesn't participate in loss calculation, used to prevent RuntimeError in DDP trainingThese are related to text_encoders training, so I tested them by training text_encoders on 2 GPUs due to the limited VRAM.