Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

deman311 · 2024-12-30T14:20:50Z

We are training the Stable Diffusion 3.5 Large model via Dreambooth method, using the sd3 branch.
Our system is an A100 Azure Remote Server.

Currently, we find that the only way to run inference on a trained model is via sd3_minimal_inference.py. We noticed that here are multiple manual operations done in this script including tokenizing and unpacking the .safetensors files of the model and text encoders.

Issues ⚠️

The only supported output format seems to be .safetensors - We compared the sd3_utils.py file to sdxl_utils.py and it seems that there is not yet support for the Diffusers format as there are many methods and conversions that seems to be absent.
Consecutively, there is not a compatible, easy way we found to load the 3 .safetensors files (model, clip_l, clip_g) to a pipe in order to run inference by ourselves, add sd3.5-compatible LoRAs on top, etc. We tried StableDiffusionPipeline, StableDiffusion3Pipeline and DiffusionPipeline all without luck of fixing the needed formats, dicts and metadata that is required in these libraries when loading the trained model.
Specifically regarding the text encoders, it seems that there is no metadata packed into the .safetensors file when saving. The config.json and such files which are normally saved into it and define the model size and parameters seem to be deliberately missing, we assume because in the sd3_minimal_inference.py file theses things are not needed and done manually. As such, we are unable to load them using libs such as CLIPModel from transformers and we are faced with incompatible format issues.
We are unable to use LoRAs, even via the sd3_minimal_inference.py script on our trained model. We try using the --lora_weights parameter (i.e tested LoRA). Is this related to the fact that 'Merging LoRAs from checkpoint' is regarded in the README as 'not yet supported'?

Thank you in advance to anyone replying and I apologize if anything aforementioned is trivial 🙏🏻

Our Relevant Config 👨‍💻

# Models

pretrained_model_name_or_path = "/kohya_ss/models/sd3.5_large.safetensors"

# Captioning

cache_latents = true
caption_dropout_every_n_epochs = 0
caption_dropout_rate = 0
caption_extension = ".txt"
clip_skip = 1
keep_tokens = 0

# Text Encoder Training

use_t5xxl_cache_only = true
t5xxl_dtype = "fp16"
train_text_encoder = true

# Learning Rates 

learning_rate = 5e-6
learning_rate_te1 = 1e-5 
learning_rate_te2 = 1e-5
loss_type = "l2"
lr_scheduler = "cosine"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 0.5
lr_warmup_steps = 0
optimizer_type = "AdamW8bit"

# Batch Sizes

text_encoder_batch_size = 1
train_batch_size = 1
epoch = 1
persistent_data_loader_workers = 0
max_data_loader_n_workers = 0

# Buckets, Noise & SNR

max_bucket_reso = 2048
min_bucket_reso = 256
bucket_no_upscale = true
bucket_reso_steps = 64
huber_c = 0.1
huber_schedule = "snr"
min_snr_gamma = 5
prior_loss_weight = 1
max_timestep = 1000
multires_noise_discount = 0.3
multires_noise_iterations = 0
noise_offset = 0
noise_offset_type = "Original"
adaptive_noise_scale = 0

# SD3 Logits

mode_scale = 1.29
weighting_scheme = "logit_normal"
logit_mean = 0
logit_std = 1

# VRAM Optimization

resolution = "512,512"
max_token_length = 75
max_train_steps = 800
mem_eff_attn = true
mixed_precision = "fp16"
full_fp16 = true
gradient_accumulation_steps = 1
gradient_checkpointing = true
xformers = true
dynamo_backend = "no"

# Sampling

sample_every_n_epochs = 50
sample_sampler = "euler"

# Model Saving

save_every_n_steps = 200
save_model_as = "diffusers"
save_precision = "fp16"

# General

output_name = "last"
log_with = "tensorboard"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

deman311 commented Dec 30, 2024

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

Comments

deman311 commented Dec 30, 2024

Issues ⚠️

Our Relevant Config 👨‍💻