You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I feel deeply apologize for everyone because in my previous PR #35438 , I accidentally inherited the typo in #34915 that ought to be fixed in #35113 and #35121. This caused the training loss of models with loss_kwargs to become very large once the gradient accumulation is enabled. I think we should merge #35651 ASAP and provide a stable version with the correct implementation.
Expected behavior
The GA loss should be correctly scaled.
The text was updated successfully, but these errors were encountered:
System Info
transformers 4.48.0
Who can help?
@ArthurZucker and @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I feel deeply apologize for everyone because in my previous PR #35438 , I accidentally inherited the typo in #34915 that ought to be fixed in #35113 and #35121. This caused the training loss of models with
loss_kwargs
to become very large once the gradient accumulation is enabled. I think we should merge #35651 ASAP and provide a stable version with the correct implementation.Expected behavior
The GA loss should be correctly scaled.
The text was updated successfully, but these errors were encountered: