About GA loss in the latest transformers version #35663

hiyouga · 2025-01-13T15:13:54Z

System Info

transformers 4.48.0

Who can help?

@ArthurZucker and @muellerzr

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I feel deeply apologize for everyone because in my previous PR #35438 , I accidentally inherited the typo in #34915 that ought to be fixed in #35113 and #35121. This caused the training loss of models with loss_kwargs to become very large once the gradient accumulation is enabled. I think we should merge #35651 ASAP and provide a stable version with the correct implementation.

Expected behavior

The GA loss should be correctly scaled.

The text was updated successfully, but these errors were encountered:

muellerzr · 2025-01-13T15:30:39Z

These things happen, and getting back into the swing of things post holidays takes a minute so I overlooked testing the slow tests locally as well.

I'll have some new tests today added which run as part of the daily CI + PR CI so we can be flagged sooner of it. 🤗

We appreciate all you've done with your investigations into it immensely!

We'll merge it soon + make it part of the patch this week

hiyouga added the bug label Jan 13, 2025

hiyouga changed the title A About GA loss in the latest transformers version Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About GA loss in the latest transformers version #35663

About GA loss in the latest transformers version #35663

hiyouga commented Jan 13, 2025

muellerzr commented Jan 13, 2025 •

edited

Loading

About GA loss in the latest transformers version #35663

About GA loss in the latest transformers version #35663

Comments

hiyouga commented Jan 13, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

muellerzr commented Jan 13, 2025 • edited Loading

muellerzr commented Jan 13, 2025 •

edited

Loading