-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text encoder of SD 1.5 model is not trained which is not supposed to happen #855
Comments
I manually set train text encoder true and added --stop_text_encoder_training 999999 But still lora extractor is saying text encoder is same |
I could reproduce the issue with same and some other settings. I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries. |
Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken You are doing incredible job |
I hope so too, but if there is something wrong with my script, I apologize.
I hope so too, but if there is something wrong with my script, I apologize. |
SDXL text encoder is also not trained |
sadly no version of SDXL is training text encoder :( I couldn't find working version with bmaltais/kohya_ss edit : 3 months old sdxl branch working for some reason |
Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries You'll then get this and it will be extracted as expected |
i tested with 0.01 and 0.004 both same learning rate 1e-5 still same when i make 0.0001 it shows very tiny difference but this seems to me wrong |
I will test with adding train_text_encoder command too ty |
by the way difference of Stable Diffusion 1.5 is also very any ideas? it is 0.0009 - 4160 steps 1e-6 LR i am using adafactor |
I am testing realistic vision 2 on ShivamShriraoDreamBooth colab I wonder how much text encoder difference it will have very low LR 4e-7 - 2080 steps |
I have tested with my dataset, AdamW 8bit optimizer, various learning rates. I found:
So I believe the scripts and the libraries are fine. However, I don't know why the same settings as before would produce different training results for Text Encoder. I wrote another script to compare Text Encoder weights. You will find embeddings.token_embedding, some norm weights and biases have a large difference than attention. The LoRA extracting script only take care of attn layers, so the script determines two Text Encoders are same.
|
@kohya-ss thank you so much can we say that setting higher text encoder learning rate can be more beneficial in this case? can we give already different LR for text encoder when doing SD 1.5 or SDXL training? |
afaik it doesn't have a way to specifiy LR for TE. |
I may have found the problem, which can be divided into two parts: 1.The initial loss values of SD1.5 training are different, which is related to line 1047 in library\model_util.py. If we change
back to
, the initial values will be the same. 2.The training process of SD1.5 is different, which is related to line 228 in train_network.py. If we delete the following two lines, the training process will be the same:
|
I had to use 0.000015 LR for it to show differences in about 8k steps, so its very slow, but the extracted lora had a working TE and behaved as expected. |
Can you provide the commit hash for the working branch? |
i think it was mistaken but not sure. i will do more research this is the branch : https://github.com/bmaltais/kohya_ss/tree/sdxl-dev |
I don't think so. I think the learning rate for Text Encoder should be lower than the learning rate for U-Net in general.
Unfortunately, it is impossible for SD 1.5. For SDXL, we can use So if we set this option, the default learning rate is used for Text Encoder. |
@kohya-ss ty my text encoder enabled training is about to be completed for SDXL with --train_text_encoder with this command it is using exactly same VRAM is this expected? but it is slower like 32% 1 more question DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality You don't have that feature? |
It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer |
thanks i should experiment and compare |
@kohya-ss anyway to set LR for text encoder? it super fast get cooked :D https://twitter.com/GozukaraFurkan/status/1710416135747748150 |
@kohya-ss Therefore I think the most likely issue lies simply with extract_lora_from_models.py erroneously thinking the two models are the same. Edit:
The resulting lora works way better than before: https://i.imgur.com/VChzcw6.jpeg |
Unfortuntaly, there is no EMA feature currently. I would like to support it, but I think other tasks have higher priority. Of course you can use another trainer :) |
As I mentioned on X, we can use |
I modified to increase MIN_DIFF before, but it seems to be too large. I will add an option to set MIN_DIFF sooner. |
I used --block_lr and it works. text encoder not anymore cooked. here some comparisons https://twitter.com/GozukaraFurkan/status/1710580153665925179 https://twitter.com/GozukaraFurkan/status/1710582243742142532 https://twitter.com/GozukaraFurkan/status/1710609957626810825 |
That's nice! I didn't know the prompt for images, but I feel the right image might represent well the prompt, for example the style and the background. |
I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not? |
The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you. |
I compared the config, and there was only one line difference: I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere. |
I’m not sure where the problem lies, but you might be right. For me, the so-called correctness is to reproduce the training results of SD1.5 before introducing SDXL. I found that when the author does not quote “openai/clip-vit-large-patch14”, the initial loss function of training will be different. And when the author later introduces
the trained SD1.5 lora will be completely damaged. As for what you said about torch_dtype=“float32” , at this time we have already abandoned the reference to “openai/clip-vit-large-patch14”, and the training results are already different from before. |
i am not sure but SDXL training is far superior atm here you can see my pictures : i shared 180+ : https://civitai.com/user/SECourses best config : https://www.patreon.com/posts/89213064 quick tutorial : https://www.youtube.com/watch?v=EEV8RPohsbw |
@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. @FurkanGozukara thanks, but I want to train SD 1.5, not SDXL. |
for sd 1.5 i am still in research my older tutorial still working great though since it has EMA support too |
I think I roughly understand what you’re saying, and when
is displayed, the value is consistent with the author’s default cfg, but the problem still results in different outcomes. As for the PyTorch issue, even if I update to 2.0 or 2.01, or even update this training program to the latest version, as long as I modify it in the way I mentioned earlier, then the results of SD1.5 lora training will be consistent with before introducing SDXL. Therefore, it’s hard to assert that it is related to PyTorch 2.0. |
I'm glad you found the source of the problem. Looking forward to the fix! :) |
Hello everyone, |
I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results |
It's great to meet you Furkan. I've always found the research you do and the dedication you have towards stable diffusion, nothing short of outstanding. You are a wonderful content maker and I fully support and recommend your work. |
I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed? |
i am using bmaltais GUI dev2 branch and working great SDXL training |
SD 1.5 TE is still not good for Lora training. Yesterday I tried the same training under 21.8.4 GUI and 22.1.1 (with updated kohya script) and got completely different results, in the latest version it was overcooked by the third epoch, while in 21.8.4 I got a perfect Lora. |
Are you using Dreambooth or Finetune in the dev2 branch? |
I trained loha on sdxl with the last two updates, tried various parameters and always had a hard time getting satisfactory results, could be due to a couple things.
|
error: unrecognized arguments: --train_text_encoder Apparently Kohya has removed this for 1.5 training and when the model for Dreambooth is only 2GB you know it does not have the TE when the model it trained from is 4.7GB. |
sd 1.5 trains by default TE |
Didn't for me, but I don't use 1.5 since 2.0 was released, just had to use it to help LyCORIS test something. |
@DarkAlchy If you do not want to train Text Encode, please add an option |
=-1? Alright. |
Here the executed command
accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/Realistic_Vision_V5.1.safetensors" --train_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/img" --reg_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/reg" --resolution="768,768" --output_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/model" --logging_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/log" --save_model_as=safetensors --full_bf16 --output_name="me_1e7" --lr_scheduler_num_cycles="4" --max_data_loader_n_workers="0" --learning_rate="1e-07" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="4160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0
When text encoder is not trained it is supposed to print
Text Encoder is not trained.
This message is not printed either
So how do I know text encoder were not trained? Because I extracted LoRA and it says text encoder is same
I did 30 trainings and so many trainings are wasted because of this bug :/
@kohya-ss
The text was updated successfully, but these errors were encountered: