flux_train.py when deepspeed enable RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 #1871

mobilejammer · 2025-01-10T07:17:17Z

my run script is:

accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 --main_process_port 8080 flux_train.py --deepspeed --pretrained_model_name_or_path /home/ubuntu/FLUX.1-dev/flux1-dev.safetensors --clip_l /home/ubuntu/FLUX.1-dev/text_encoder/model.safetensors --t5xxl /home/ubuntu/FLUX.1-dev/text_encoder_2/merged_text_encoder.safetensors --ae /home/ubuntu/FLUX.1-dev/ae.safetensors --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --dataset_config ./data.toml --output_dir ./ --output_name output-name --learning_rate 5e-5 --max_train_epochs 20 --sdpa --highvram --cache_text_encoder_outputs_to_disk --cache_latents_to_disk --save_every_n_epochs 5 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0 --fused_backward_pass --full_bf16

[rank7]: Traceback (most recent call last):
[rank7]: File "/home/ubuntu/sd-scripts/flux_train.py", line 849, in
[rank7]: train(args)
[rank7]: File "/home/ubuntu/sd-scripts/flux_train.py", line 648, in train
[rank7]: model_pred = flux(
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: File "/home/ubuntu/sd-scripts/library/flux_models.py", line 1031, in forward
[rank7]: vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: File "/home/ubuntu/sd-scripts/library/flux_models.py", line 566, in forward
[rank7]: return checkpoint(self._forward, *args, use_reentrant=False, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
[rank7]: return disable_fn(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank7]: return fn(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
[rank7]: ret = function(*args, **kwargs)
[rank7]: File "/home/ubuntu/sd-scripts/library/flux_models.py", line 562, in _forward
[rank7]: return self.out_layer(self.silu(self.in_layer(x)))
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
[rank7]: return F.linear(input, self.weight, self.bias)
[rank7]: RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16

when deepspeed not enable, run ok.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux_train.py when deepspeed enable RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 #1871

flux_train.py when deepspeed enable RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 #1871

mobilejammer commented Jan 10, 2025

flux_train.py when deepspeed enable RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 #1871

flux_train.py when deepspeed enable RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16 #1871

Comments

mobilejammer commented Jan 10, 2025