Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About GA loss in the latest transformers version #35663

Open
4 tasks
hiyouga opened this issue Jan 13, 2025 · 1 comment
Open
4 tasks

About GA loss in the latest transformers version #35663

hiyouga opened this issue Jan 13, 2025 · 1 comment
Labels

Comments

@hiyouga
Copy link
Contributor

hiyouga commented Jan 13, 2025

System Info

transformers 4.48.0

Who can help?

@ArthurZucker and @muellerzr

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I feel deeply apologize for everyone because in my previous PR #35438 , I accidentally inherited the typo in #34915 that ought to be fixed in #35113 and #35121. This caused the training loss of models with loss_kwargs to become very large once the gradient accumulation is enabled. I think we should merge #35651 ASAP and provide a stable version with the correct implementation.

Expected behavior

The GA loss should be correctly scaled.

@hiyouga hiyouga added the bug label Jan 13, 2025
@hiyouga hiyouga changed the title A About GA loss in the latest transformers version Jan 13, 2025
@muellerzr
Copy link
Contributor

muellerzr commented Jan 13, 2025

These things happen, and getting back into the swing of things post holidays takes a minute so I overlooked testing the slow tests locally as well.

I'll have some new tests today added which run as part of the daily CI + PR CI so we can be flagged sooner of it. 🤗

We appreciate all you've done with your investigations into it immensely!

We'll merge it soon + make it part of the patch this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants