-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fused_backward_pass in prodigy-plus-schedule-free [SOLVED VIA EXTERNAL SOLUTION ✅] #1834
Comments
I tried both the Lora algorithm and the Glora+Dora algorithm on SDXL - no noticeable decrease in VRAM usage. Speeds are the same with and without fused_backward_pass also. Settings for example: accelerate launch --num_cpu_threads_per_process 8 sdxl_train_network.py ^ |
Try it without my propesed changes, apparently it was already working if you set --fused_back_pass as an optimizer arg |
Tested it earlier:
The speeds and other metrics are about 30% faster than usual during the training process. I don’t remember the exact VRAM usage, but there was no significant decrease. I also used --fused_backward_pass together with the optimizer argument, but the resulting LoRA is not working, just like in the cases above. Saying 'lora not working' I mean it trains without errors or NaNs, tensorboard shows regular graphs, but if I use resulting lora with the model - no changes for the image even with 1000 weight power. |
Problem solved in new version of https://github.com/LoganBooker/prodigy-plus-schedule-free |
Hi, Kohya. I know you hardcoded fused_backward_pass to Adafactor, but prodigy-plus-schedule-free https://github.com/LoganBooker/prodigy-plus-schedule-free has that feature inside already, but we cant use it. That is, in fact, we can apply the built-in argument itself, but it breaks the training process. Can you add more flexibility in this case, please?
The text was updated successfully, but these errors were encountered: