use_liger_kernel requires much more GPU memory during evaluation than training #35689

Smu-Tan · 2025-01-14T10:22:35Z

System Info

I found when enabling use_liger_kernel=True, it does reduce GPU memory for training. however, when dealing with evaluation, I found it requires much more GPU memory than training, even though the per_device_eval_batch_size is smaller than per_device_train_batch_size, and seq_lengths are similar.

Architecture/Model:
AutoModelForSequenceClassification - Qwen/Qwen2.5-1.5B (it happens on all qwen2.5 models, including 0.5b to 32b ones);

Specific Setting:
--per_device_train_batch_size 4 --gradient_accumulation_steps 4 --per_device_eval_batch_size 1 --bf16 --max_length 4096 --gradient_checkpointing True --group_by_length True --use_liger_kernel True --attn_implementation flash_attention_2

Unnecessary but things you might want to know:
I use deepspeed zero(1/2/3), yet I found this issue also exists when running with ddp.

People who could help:
@muellerzr @SunMarc

Who can help?

@muellerzr @SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

To reproduce:

Simply follow this trl reward modeling example.

Expected behavior

I expect enabling use_liger_kernel=True does not occupy much more GPU memory for evaluation than training.

The text was updated successfully, but these errors were encountered:

SunMarc · 2025-01-14T13:31:25Z

Thanks for the report ! Did you also see that in your experiments @ByronHsu ?

Smu-Tan added the bug label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use_liger_kernel requires much more GPU memory during evaluation than training #35689

use_liger_kernel requires much more GPU memory during evaluation than training #35689

Smu-Tan commented Jan 14, 2025

SunMarc commented Jan 14, 2025

use_liger_kernel requires much more GPU memory during evaluation than training #35689

use_liger_kernel requires much more GPU memory during evaluation than training #35689

Comments

Smu-Tan commented Jan 14, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

SunMarc commented Jan 14, 2025