generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues: huggingface/trl
[Tracking issue] Integrate native liger-kernel losses
#2495
opened Dec 17, 2024 by
qgallouedec
Open
2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
RuntimeError: Function 'Log1PBackward0' returned nan values in its 0th output.
🐛 bug
Something isn't working
🏋 ORPO
Related to ORPO
#2564
opened Jan 13, 2025 by
zhaoxjmail
7 of 9 tasks
How is Token-Level KL different from Sequence-Level KL?
❓ question
Seeking clarification or more information
🏋 RLOO
Related to RLOO
#2562
opened Jan 11, 2025 by
mnoukhov
5 of 9 tasks
finetune a very small 0.5B qwen2.5 model with method of pissa on 2 *A800 (80G each, 120G available ) strangely met with OOM error
🐛 bug
Something isn't working
🏋 SFT
Related to SFT
#2559
opened Jan 10, 2025 by
chuangzhidan
8 of 9 tasks
Problem with accelerate>=1.0.0 when running official PPO/RLOO examples
⚡accelerate
Related to accelerate
🏋 PPO
Related to PPO
🏋 RLOO
Related to RLOO
#2555
opened Jan 10, 2025 by
dawidm
7 of 9 tasks
KTOTrainer should work when actual batch size==1
✨ enhancement
New feature or request
🏋 KTO
Related to KTO
#2554
opened Jan 10, 2025 by
starmpcc
DPO loss constant, logits chosen/rejected identical, and rewards nan
🐛 bug
Something isn't working
🏋 DPO
Related to DPO
#2553
opened Jan 9, 2025 by
solume
7 of 9 tasks
Finetuning on the last turn of multi-turn conversations
❓ question
Seeking clarification or more information
🏋 SFT
Related to SFT
#2545
opened Jan 6, 2025 by
okhat
Is Related to DPO
❓ question
Seeking clarification or more information
truncation_mode
used in DPOTrainer
?
🏋 DPO
#2538
opened Jan 2, 2025 by
anakin87
SFTTrainer explicitly skips Related to deepspeed
⚡ PEFT
Related to PEFT
🏋 SFT
Related to SFT
prepare_model_for_kbit_training
if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this
🚀 deepspeed
#2537
opened Jan 2, 2025 by
alexdauenhauer
7 of 9 tasks
Different finetune speed in DPO task of peft and ms-swift (600/S iter vs 30/s iter)
🏋 DPO
Related to DPO
🙋 help from community wanted
Open invitation for community members to contribute
⚡ PEFT
Related to PEFT
#2536
opened Jan 2, 2025 by
maoulee
7 of 9 tasks
(Willing to PR) Will it be welcomed if speeding up algorithms like PPO and code refactor/cleanup?
🏋 PPO
Related to PPO
❓ question
Seeking clarification or more information
🏋 RLOO
Related to RLOO
#2535
opened Dec 31, 2024 by
fzyzcjy
Using "beam search" strategy while generating the responses
🙋 help from community wanted
Open invitation for community members to contribute
🏋 PPO
Related to PPO
#2534
opened Dec 31, 2024 by
SachinVashisth
onlinedpo error when use deepspeed zero3
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
⏳ needs more info
Additional information or clarification is required to proceed
🏋 Online DPO
Related to Online DPO
#2532
opened Dec 30, 2024 by
yiyepiaoling0715
5 of 9 tasks
PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way
🐛 bug
Something isn't working
🏋 PPO
Related to PPO
#2530
opened Dec 29, 2024 by
dawidm
6 of 9 tasks
Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO
✨ enhancement
New feature or request
🏋 Online DPO
Related to Online DPO
🏋 PPO
Related to PPO
🏋 RLOO
Related to RLOO
#2529
opened Dec 28, 2024 by
dawidm
Direct Q-Function Optimization
✨ enhancement
New feature or request
#2526
opened Dec 28, 2024 by
catherinelee274
Integrate OREO into TRL and HF
✨ enhancement
New feature or request
#2525
opened Dec 28, 2024 by
August-murr
3 tasks done
How to solve the situation where the tokenizer of the reward model is inconsistent with the tokenizer of the actor model?
❓ question
Seeking clarification or more information
#2523
opened Dec 27, 2024 by
stephen-nju
[question] best way to have my own reward model which is backed by rules
🏋 PPO
Related to PPO
❓ question
Seeking clarification or more information
#2518
opened Dec 24, 2024 by
yananchen1989
Soft Actor-Critic (SAC) Trainer
✨ enhancement
New feature or request
#2517
opened Dec 23, 2024 by
AMindToThink
3 tasks
RLOO trainer epochs/steps/episodes calculations seems not to be working properly
🐛 bug
Something isn't working
🏋 RLOO
Related to RLOO
#2515
opened Dec 23, 2024 by
dawidm
7 of 9 tasks
Checkpointing is failing with SFTTrainer PEFT LoRA on DeepSpeed Zero-3
🐛 bug
Something isn't working
⚡ PEFT
Related to PEFT
🏋 SFT
Related to SFT
#2514
opened Dec 21, 2024 by
SwayamInSync
7 of 9 tasks
Absence of ref_model_name in the file which located in docs/source/best_of_n.mdx
#2508
opened Dec 20, 2024 by
aivolcano
7 of 9 tasks
DDPO checkpoint ú·
🐛 bug
Something isn't working
🏋 DPPO
Related to DDPO
🙋 help from community wanted
Open invitation for community members to contribute
⏳ needs more info
Additional information or clarification is required to proceed
#2505
opened Dec 20, 2024 by
nguyenhoa-uit
5 of 9 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-12-14.