You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Actual (not effective) batch size must be > 1. KTO will not work properly because the KL term will be equivalent to the implied reward."
)
This check was introduced in #2153
However, the KL logits were calculated by unlinking prompt_input_ids and answer_input_ids, which means the KL term is not equivalent to the reward term.
Accordingly, KTOTrainer should work when the actual batch size is 1.
Thank you!
The text was updated successfully, but these errors were encountered:
trl/trl/trainer/kto_trainer.py
Lines 662 to 665 in edabe0a
This check was introduced in #2153
However, the KL logits were calculated by unlinking
prompt_input_ids
andanswer_input_ids
, which means the KL term is not equivalent to the reward term.Accordingly,
KTOTrainer
should work when the actual batch size is 1.Thank you!
The text was updated successfully, but these errors were encountered: