Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

Open
4 tasks
deman311 opened this issue Dec 30, 2024 · 0 comments
Open
4 tasks

Text Encoders & Inference - SD3.5L Dreambooth 💫 #1860

deman311 opened this issue Dec 30, 2024 · 0 comments

Comments

@deman311
Copy link

We are training the Stable Diffusion 3.5 Large model via Dreambooth method, using the sd3 branch.
Our system is an A100 Azure Remote Server.

Currently, we find that the only way to run inference on a trained model is via sd3_minimal_inference.py. We noticed that here are multiple manual operations done in this script including tokenizing and unpacking the .safetensors files of the model and text encoders.

Issues ⚠️

  • The only supported output format seems to be .safetensors - We compared the sd3_utils.py file to sdxl_utils.py and it seems that there is not yet support for the Diffusers format as there are many methods and conversions that seems to be absent.

  • Consecutively, there is not a compatible, easy way we found to load the 3 .safetensors files (model, clip_l, clip_g) to a pipe in order to run inference by ourselves, add sd3.5-compatible LoRAs on top, etc. We tried StableDiffusionPipeline, StableDiffusion3Pipeline and DiffusionPipeline all without luck of fixing the needed formats, dicts and metadata that is required in these libraries when loading the trained model.

  • Specifically regarding the text encoders, it seems that there is no metadata packed into the .safetensors file when saving. The config.json and such files which are normally saved into it and define the model size and parameters seem to be deliberately missing, we assume because in the sd3_minimal_inference.py file theses things are not needed and done manually. As such, we are unable to load them using libs such as CLIPModel from transformers and we are faced with incompatible format issues.

  • We are unable to use LoRAs, even via the sd3_minimal_inference.py script on our trained model. We try using the --lora_weights parameter (i.e tested LoRA). Is this related to the fact that 'Merging LoRAs from checkpoint' is regarded in the README as 'not yet supported'?

Thank you in advance to anyone replying and I apologize if anything aforementioned is trivial 🙏🏻

Our Relevant Config 👨‍💻

# Models

pretrained_model_name_or_path = "/kohya_ss/models/sd3.5_large.safetensors"

# Captioning

cache_latents = true
caption_dropout_every_n_epochs = 0
caption_dropout_rate = 0
caption_extension = ".txt"
clip_skip = 1
keep_tokens = 0

# Text Encoder Training

use_t5xxl_cache_only = true
t5xxl_dtype = "fp16"
train_text_encoder = true

# Learning Rates 

learning_rate = 5e-6
learning_rate_te1 = 1e-5 
learning_rate_te2 = 1e-5
loss_type = "l2"
lr_scheduler = "cosine"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 0.5
lr_warmup_steps = 0
optimizer_type = "AdamW8bit"

# Batch Sizes

text_encoder_batch_size = 1
train_batch_size = 1
epoch = 1
persistent_data_loader_workers = 0
max_data_loader_n_workers = 0

# Buckets, Noise & SNR

max_bucket_reso = 2048
min_bucket_reso = 256
bucket_no_upscale = true
bucket_reso_steps = 64
huber_c = 0.1
huber_schedule = "snr"
min_snr_gamma = 5
prior_loss_weight = 1
max_timestep = 1000
multires_noise_discount = 0.3
multires_noise_iterations = 0
noise_offset = 0
noise_offset_type = "Original"
adaptive_noise_scale = 0

# SD3 Logits

mode_scale = 1.29
weighting_scheme = "logit_normal"
logit_mean = 0
logit_std = 1

# VRAM Optimization

resolution = "512,512"
max_token_length = 75
max_train_steps = 800
mem_eff_attn = true
mixed_precision = "fp16"
full_fp16 = true
gradient_accumulation_steps = 1
gradient_checkpointing = true
xformers = true
dynamo_backend = "no"

# Sampling

sample_every_n_epochs = 50
sample_sampler = "euler"

# Model Saving

save_every_n_steps = 200
save_model_as = "diffusers"
save_precision = "fp16"

# General

output_name = "last"
log_with = "tensorboard"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant