You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found when enabling use_liger_kernel=True, it does reduce GPU memory for training. however, when dealing with evaluation, I found it requires much more GPU memory than training, even though the per_device_eval_batch_size is smaller than per_device_train_batch_size, and seq_lengths are similar.
Architecture/Model:
AutoModelForSequenceClassification - Qwen/Qwen2.5-1.5B (it happens on all qwen2.5 models, including 0.5b to 32b ones);
System Info
I found when enabling use_liger_kernel=True, it does reduce GPU memory for training. however, when dealing with evaluation, I found it requires much more GPU memory than training, even though the per_device_eval_batch_size is smaller than per_device_train_batch_size, and seq_lengths are similar.
Architecture/Model:
AutoModelForSequenceClassification - Qwen/Qwen2.5-1.5B (it happens on all qwen2.5 models, including 0.5b to 32b ones);
Specific Setting:
--per_device_train_batch_size 4 --gradient_accumulation_steps 4 --per_device_eval_batch_size 1 --bf16 --max_length 4096 --gradient_checkpointing True --group_by_length True --use_liger_kernel True --attn_implementation flash_attention_2
Unnecessary but things you might want to know:
I use deepspeed zero(1/2/3), yet I found this issue also exists when running with ddp.
People who could help:
@muellerzr @SunMarc
Who can help?
@muellerzr @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
To reproduce:
Simply follow this trl reward modeling example.
Expected behavior
I expect enabling use_liger_kernel=True does not occupy much more GPU memory for evaluation than training.
The text was updated successfully, but these errors were encountered: