Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_liger_kernel requires much more GPU memory during evaluation than training #35689

Open
2 of 4 tasks
Smu-Tan opened this issue Jan 14, 2025 · 1 comment
Open
2 of 4 tasks
Labels

Comments

@Smu-Tan
Copy link

Smu-Tan commented Jan 14, 2025

System Info

I found when enabling use_liger_kernel=True, it does reduce GPU memory for training. however, when dealing with evaluation, I found it requires much more GPU memory than training, even though the per_device_eval_batch_size is smaller than per_device_train_batch_size, and seq_lengths are similar.

Image

Architecture/Model:
AutoModelForSequenceClassification - Qwen/Qwen2.5-1.5B (it happens on all qwen2.5 models, including 0.5b to 32b ones);

Specific Setting:
--per_device_train_batch_size 4 --gradient_accumulation_steps 4 --per_device_eval_batch_size 1 --bf16 --max_length 4096 --gradient_checkpointing True --group_by_length True --use_liger_kernel True --attn_implementation flash_attention_2

Unnecessary but things you might want to know:
I use deepspeed zero(1/2/3), yet I found this issue also exists when running with ddp.

People who could help:
@muellerzr @SunMarc

Who can help?

@muellerzr @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To reproduce:

Simply follow this trl reward modeling example.

Expected behavior

I expect enabling use_liger_kernel=True does not occupy much more GPU memory for evaluation than training.

@Smu-Tan Smu-Tan added the bug label Jan 14, 2025
@SunMarc
Copy link
Member

SunMarc commented Jan 14, 2025

Thanks for the report ! Did you also see that in your experiments @ByronHsu ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants