Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099

thojmr · 2024-02-02T20:27:25Z

Edit: I solved it 2 posts down.

Multi GPU training fails with the below error when using CLIP skip 2 with finetune.py (SD1.5).

File "/Desktop/code/kohya-sd-scripts/fine_tune.py", line 344, in train
    encoder_hidden_states = train_util.get_hidden_states(
File "/Desktop/code/kohya-sd-scripts/library/train_util.py", line 4139, in get_hidden_states
    encoder_hidden_states = text_encoder.text_model.final_layer_norm(encoder_hidden_states)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'text_model'

It's fails here in the code:

    if args.clip_skip is None:
        encoder_hidden_states = text_encoder(input_ids)[0]
    else:
        enc_out = text_encoder(input_ids, output_hidden_states=True, return_dict=True)
        encoder_hidden_states = enc_out["hidden_states"][-args.clip_skip]
        encoder_hidden_states = text_encoder.text_model.final_layer_norm(encoder_hidden_states)   #<-  Fails here

Seems to be similar to these tickets, but I'm not sure which objects needs to be unwrapped or where to do it.
#1000
#1019

Any guidance or things to try would be helpful.
Thanks!

Edit: minimum reproduction script using latest repo version

accelerate launch --num_cpu_threads_per_process=4 fine_tune.py \
    --pretrained_model_name_or_path="${pretrained_model_name_or_path}" \
    --in_json $metadata_dir"/meta_cap.json" \
    --train_data_dir="../../datasets/${raw_img_folder}" \
    --output_dir="./output/${short_name}" \
    --resolution="512,512" \
    --train_batch_size=1 \
    --learning_rate=2e-6 \
    --learning_rate_te=1e-6 \
    --lr_scheduler="cosine" \
    --max_train_epochs 10 \
    --mixed_precision="bf16" \
    --save_precision="fp16" \
    --save_every_n_steps=10000000 \
    --enable_bucket \
    --clip_skip=2 \
    --logging_dir=logs \
    --save_model_as="safetensors" \
    --output_name="test" \
    --caption_extension=".txt" \
    --train_text_encoder

The text was updated successfully, but these errors were encountered:

thojmr · 2024-02-03T01:37:02Z

Some findings so far.

I have attempted to use accelerator.unwrap_model(text_encoder).text_model.final_layer_norm(encoder_hidden_states) on the text encoder to bypass the above error. However doing so causes some of the gradient to be skipped as seen in the error below. Layers text_model.encoder.layers.11 specifically

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by 
making sure all `forward` function outputs participate in calculating loss. 
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameters which did not receive grad for rank 1: text_model.encoder.layers.11.layer_norm2.bias, text_model.encoder.layers.11.layer_norm2.weight, text_model.encoder.layers.11.mlp.fc2.bias, text_model.encoder.layers.11.mlp.fc2.weight, text_model.encoder.layers.11.mlp.fc1.bias, text_model.encoder.layers.11.mlp.fc1.weight, text_model.encoder.layers.11.layer_norm1.bias, text_model.encoder.layers.11.layer_norm1.weight, text_model.encoder.layers.11.self_attn.out_proj.bias, text_model.encoder.layers.11.self_attn.out_proj.weight, text_model.encoder.layers.11.self_attn.q_proj.bias, text_model.encoder.layers.11.self_attn.q_proj.weight, text_model.encoder.layers.11.self_attn.v_proj.bias, text_model.encoder.layers.11.self_attn.v_proj.weight, text_model.encoder.layers.11.self_attn.k_proj.bias, text_model.encoder.layers.11.self_attn.k_proj.weight
Parameter indices which did not receive grad for rank 1: 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193

It seems to be angry about the last layer which makes sense with clip_skip=2. I am currently trying to figure out how to have the script skip performing gradient calculation on this layer (similar to SDXL), but the below does not seem to be working. So I am stuck.

if args.clip_skip == 2:
            print("freezing last layer")
            text_encoder.text_model.encoder.layers[-1].requires_grad_(False)
            text_encoder.text_model.final_layer_norm.requires_grad_(False)

Am I heading in the right direction?

thojmr · 2024-02-03T03:04:41Z

I think I finally got it working. Here were my changes. Line numbers are probably off a bit, and my changes are only working for my specific use case (happy path).

The fix was unwrapping the text encoder with accelerator when using clip_skip=2 during distributed training.

Edit: I realized that my local changes might make the line numbers off by 30 lines at most. So I added some more details on where to place changes.

fine_tune.py:~176

    # Add this code to freeze the last CLIP layer so gradient is not computed for those layers (similar to SDXL_train.py)
    # Place just before 'if not cache_latents:'
    if args.clip_skip == 2:
        print("freezing last layer")
        text_encoder.text_model.encoder.layers[-1].requires_grad_(False)
        text_encoder.text_model.final_layer_norm.requires_grad_(False)

fine_tune.py:~195

    # for m in training_models:
    #     m.requires_grad_(True)
    # We replace the above lines with the below line so the text encoder require_grad prop is not overridden
    training_models[0].requires_grad_(True)

train_util.py:~4139

    # Unwrap the text encoder when clip skip is 2.  Add "accelerator" as a param to the parent method
    # Replace this line with below: encoder_hidden_states = text_encoder.text_model.final_layer_norm(encoder_hidden_states)
    encoder_hidden_states = accelerator.unwrap_model(text_encoder).text_model.final_layer_norm(encoder_hidden_states) if accelerator else text_encoder.text_model.final_layer_norm(encoder_hidden_states)
    
    # also add `accelerator` param to the get_hidden_states method, and any calls to this method

~~Ill try training with it soon, that was enough adventure for one day.~~ (It worked for me)

fschiro · 2024-04-17T13:09:01Z

Same problem with dual 4090 system. Training failed unless I set clip skip = 1.

zdoek001 · 2024-05-11T21:02:39Z

I think I finally got it working. Here were my changes. Line numbers are probably off a bit, and my changes are only working for my specific use case (happy path).

fine_tune.py:~176
    # Add this code to freeze the last CLIP layer so gradient is not computed for those layers (similar to SDXL_train.py)
    if args.clip_skip == 2:
        print("freezing last layer")
        text_encoder.text_model.encoder.layers[-1].requires_grad_(False)
        text_encoder.text_model.final_layer_norm.requires_grad_(False)
fine_tune.py:~195
    # for m in training_models:
    #     m.requires_grad_(True)
    # We replace the above lines with the below line so the text encoder require_grad prop is not overridden
    training_models[0].requires_grad_(True)
train_util.py:~4139
    # unwrap the text encoder when clip skip is 2.  Add "accelerator" as a param to the parent method
    encoder_hidden_states = accelerator.unwrap_model(text_encoder).text_model.final_layer_norm(encoder_hidden_states) if accelerator else text_encoder.text_model.final_layer_norm(encoder_hidden_states)
Ill try training with it soon, that was enough adventure for one day. As a side note why is with accelerator.accumulate(training_models[0]) not with accelerator.accumulate(training_models[0], training_models[1]) when training TE?

# unwrap the text encoder when clip skip is 2.  Add "accelerator" as a param to the parent method
encoder_hidden_states = accelerator.unwrap_model(text_encoder).text_model.final_layer_norm(encoder_hidden_states) if accelerator else text_encoder.text_model.final_layer_norm(encoder_hidden_states)

How to add? The specified location cannot be found and the accelerator is missing.

thojmr · 2024-06-04T18:12:05Z

How to add? The specified location cannot be found and the accelerator is missing.

Hey, sorry I don't log in very often. I've updated my fix above with more details on what changes go where.

Nice-Zhang66 · 2024-07-01T12:37:36Z

I had the same problem as you, may I ask how you eventually solved it.
I followed your instructions to make changes in fine_tune.py and train_util.py respectively and it didn't work.

thojmr · 2024-09-06T21:16:31Z

how you eventually solved it.

A lot of google searching around to handle distributed models in accelerate. I think someone had a similar problem in another repo that I used as reference, but Its been to long now to remember.

kohya-ss added the bug Something isn't working label Feb 29, 2024

kohya-ss mentioned this issue Apr 21, 2024

New training broken on Kaggle due to DistributedDataParallel and torch.distributed.elastic.multiprocessing.api #1272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099

Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099

thojmr commented Feb 2, 2024 •

edited

Loading

thojmr commented Feb 3, 2024 •

edited

Loading

thojmr commented Feb 3, 2024 •

edited

Loading

fschiro commented Apr 17, 2024

zdoek001 commented May 11, 2024

thojmr commented Jun 4, 2024

Nice-Zhang66 commented Jul 1, 2024

thojmr commented Sep 6, 2024

Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099

Multi GPU finetuning on SD1.5 with CLIP Skip 2 fails #1099

Comments

thojmr commented Feb 2, 2024 • edited Loading

thojmr commented Feb 3, 2024 • edited Loading

thojmr commented Feb 3, 2024 • edited Loading

fschiro commented Apr 17, 2024

zdoek001 commented May 11, 2024

thojmr commented Jun 4, 2024

Nice-Zhang66 commented Jul 1, 2024

thojmr commented Sep 6, 2024

thojmr commented Feb 2, 2024 •

edited

Loading

thojmr commented Feb 3, 2024 •

edited

Loading

thojmr commented Feb 3, 2024 •

edited

Loading