-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more rigerous non-slow grad accum tests #35668
base: main
Are you sure you want to change the base?
Conversation
To check the GA loss for the condition that loss_kwargs was not enabled, could we simply hack the trainer by setting |
@hiyouga if you notice, |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@muellerzr Yhh I get it. The existing test cases only ensure the "diff broken" is very off, but could we have a test case to ensure the models do not accept |
def test_gradient_accumulation_loss_alignment_with_loss_func(self): | ||
set_seed(42) | ||
import datasets | ||
|
||
model_name = "roneneldan/TinyStories-33M" | ||
model_name = "nickypro/tinyllama-15M" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing compute_loss_func, I think it's better to use a model that doesn't accept loss kwargs. Maybe TinyStories-33M
is ok?"
What does this PR do?
Should be merged after #35651 (tests will fail until I rebase).
This PR tweaks the Trainer grad_accum tests to:
nickypro/tinyllama-15M
)max_length=16
+ first 40 items)This takes us on CPU from ~76s per test to ~6s per test so we can run it as part of our PR CI and squash grad accum issues before they get merged
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@Rocketknight1 @ArthurZucker