Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more rigerous non-slow grad accum tests #35668

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

muellerzr
Copy link
Contributor

@muellerzr muellerzr commented Jan 13, 2025

What does this PR do?

Should be merged after #35651 (tests will fail until I rebase).

This PR tweaks the Trainer grad_accum tests to:

  • Use a much smaller CLM (nickypro/tinyllama-15M)
  • Use much smaller data + chunk of the dataset (max_length=16 + first 40 items)

This takes us on CPU from ~76s per test to ~6s per test so we can run it as part of our PR CI and squash grad accum issues before they get merged

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Rocketknight1 @ArthurZucker

@hiyouga
Copy link
Contributor

hiyouga commented Jan 13, 2025

To check the GA loss for the condition that loss_kwargs was not enabled, could we simply hack the trainer by setting trainer.model_accepts_loss_kwargs=False before trainer.train()?

@muellerzr
Copy link
Contributor Author

@hiyouga if you notice, diff_broken does this already in both tests.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hiyouga
Copy link
Contributor

hiyouga commented Jan 13, 2025

@muellerzr Yhh I get it. The existing test cases only ensure the "diff broken" is very off, but could we have a test case to ensure the models do not accept loss_kwargs (like Qwen2-VL) to be correctly trained when GA is enabled?

def test_gradient_accumulation_loss_alignment_with_loss_func(self):
set_seed(42)
import datasets

model_name = "roneneldan/TinyStories-33M"
model_name = "nickypro/tinyllama-15M"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing compute_loss_func, I think it's better to use a model that doesn't accept loss kwargs. Maybe TinyStories-33M is ok?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants