-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099
Comments
Some findings so far. I have attempted to use
It seems to be angry about the last layer which makes sense with clip_skip=2. I am currently trying to figure out how to have the script skip performing gradient calculation on this layer (similar to SDXL), but the below does not seem to be working. So I am stuck. if args.clip_skip == 2:
print("freezing last layer")
text_encoder.text_model.encoder.layers[-1].requires_grad_(False)
text_encoder.text_model.final_layer_norm.requires_grad_(False) Am I heading in the right direction? |
I think I finally got it working. Here were my changes. Line numbers are probably off a bit, and my changes are only working for my specific use case (happy path). The fix was unwrapping the text encoder with Edit: I realized that my local changes might make the line numbers off by 30 lines at most. So I added some more details on where to place changes.
# Add this code to freeze the last CLIP layer so gradient is not computed for those layers (similar to SDXL_train.py)
# Place just before 'if not cache_latents:'
if args.clip_skip == 2:
print("freezing last layer")
text_encoder.text_model.encoder.layers[-1].requires_grad_(False)
text_encoder.text_model.final_layer_norm.requires_grad_(False)
# for m in training_models:
# m.requires_grad_(True)
# We replace the above lines with the below line so the text encoder require_grad prop is not overridden
training_models[0].requires_grad_(True)
# Unwrap the text encoder when clip skip is 2. Add "accelerator" as a param to the parent method
# Replace this line with below: encoder_hidden_states = text_encoder.text_model.final_layer_norm(encoder_hidden_states)
encoder_hidden_states = accelerator.unwrap_model(text_encoder).text_model.final_layer_norm(encoder_hidden_states) if accelerator else text_encoder.text_model.final_layer_norm(encoder_hidden_states)
# also add `accelerator` param to the get_hidden_states method, and any calls to this method
|
Same problem with dual 4090 system. Training failed unless I set clip skip = 1. |
How to add? The specified location cannot be found and the accelerator is missing. |
Hey, sorry I don't log in very often. I've updated my fix above with more details on what changes go where. |
I had the same problem as you, may I ask how you eventually solved it. |
A lot of google searching around to handle distributed models in accelerate. I think someone had a similar problem in another repo that I used as reference, but Its been to long now to remember. |
Edit: I solved it 2 posts down.
Multi GPU training fails with the below error when using CLIP skip 2 with
finetune.py
(SD1.5).It's fails here in the code:
Seems to be similar to these tickets, but I'm not sure which objects needs to be unwrapped or where to do it.
#1000
#1019
Any guidance or things to try would be helpful.
Thanks!
Edit: minimum reproduction script using latest repo version
The text was updated successfully, but these errors were encountered: