Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Save N epoch and N steps, comma seperated #1727

Open
FurkanGozukara opened this issue Oct 26, 2024 · 3 comments
Open

Feature Request: Save N epoch and N steps, comma seperated #1727

FurkanGozukara opened this issue Oct 26, 2024 · 3 comments

Comments

@FurkanGozukara
Copy link

FurkanGozukara commented Oct 26, 2024

We need to be able to save only certain epochs and steps

Like save epoch 30,35,40,45 and no others
or
Save step 300,400,500 and no others

Can you please add this option? Thank you @kohya-ss

This became super important for FLUX training since each checkpoint is 24 GB

This is for saving checkpoints but saving state option this way would be nice as well

@kohya-ss
Copy link
Owner

I think that the functionality is sufficient if we combine options --save_every_n_epochs and --save_last_n_epochs. Saving checkpoints does take time, but if there is a problem and training ends midway, it would be more of a problem if the checkpoints were not saved.

@FurkanGozukara
Copy link
Author

@kohya-ss it is still not being exactly same

lets say i wanted to save 30, 50, 55, currently this is not possible

also last time i tested --save_last_n_epochs it didnt worked :D it tried to save the 4th saving and after that it is trying to delete thus i had out of space error , i had it as 3

but i am gonna test again lets. i think it should delete last one and after that save next one - thus fully utilize space

@dsienra
Copy link

dsienra commented Oct 26, 2024

Quote reply
Refer

I set Save last N epochs state to 2, my intention was to have just the last 2 or 3 safetensor checkpoints saved because a disk space restrictions, I saving each 25 epochs, I should set "Save last N epochs state" to 50 if I want to keep the las 2 or 75 to keep the last 3, or it doesn't work this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants