You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I've identified an issue regarding unnecessary KV cache updates during training, which affects all current LLM models in the library and impacts both memory efficiency and torch.compile compatibility.
Taking src/transformers/models/llama/modeling_llama.py as an example:
The past_key_values is set as a DynamicCache object even during training:
@Rocketknight1 and I chatted offline and we can't think of a reason why the cache should stay active while training. I'm going to update to the pattern as you wrote -- if the user doesn't manually specify the cache, only use the config file value (which is True by default) if not training
System Info
transformers
version: 4.48.0Who can help?
@ArthurZucker @muellerz @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I've identified an issue regarding unnecessary KV cache updates during training, which affects all current LLM models in the library and impacts both memory efficiency and torch.compile compatibility.
Taking
src/transformers/models/llama/modeling_llama.py
as an example:The
past_key_values
is set as aDynamicCache
object even during training:transformers/src/transformers/models/llama/modeling_llama.py
Lines 547 to 548 in 15bd3e6
This leads to unnecessary memory allocation for storing KV cache during the forward pass:
transformers/src/transformers/models/llama/modeling_llama.py
Lines 273 to 276 in 15bd3e6
torch.compile
, resulting in multiple recompilations due to layer_idx guards (my internal test) :Workaround
A temporary solution is to explicitly set
use_cache=False
during training:Expected behavior
no kv cache during training
The text was updated successfully, but these errors were encountered: