Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

RuntimeError: Function 'Log1PBackward0' returned nan values in its 0th output. 🐛 bug Something isn't working 🏋 ORPO Related to ORPO
#2564 opened Jan 13, 2025 by zhaoxjmail
7 of 9 tasks
dpo_vlm.py 🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models
#2563 opened Jan 12, 2025 by liuchaohu
5 of 9 tasks
How is Token-Level KL different from Sequence-Level KL? ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2562 opened Jan 11, 2025 by mnoukhov
5 of 9 tasks
Problem with accelerate>=1.0.0 when running official PPO/RLOO examples ⚡accelerate Related to accelerate 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2555 opened Jan 10, 2025 by dawidm
7 of 9 tasks
KTOTrainer should work when actual batch size==1 ✨ enhancement New feature or request 🏋 KTO Related to KTO
#2554 opened Jan 10, 2025 by starmpcc
DPO loss constant, logits chosen/rejected identical, and rewards nan 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2553 opened Jan 9, 2025 by solume
7 of 9 tasks
Finetuning on the last turn of multi-turn conversations ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2545 opened Jan 6, 2025 by okhat
Is truncation_mode used in DPOTrainer? 🏋 DPO Related to DPO ❓ question Seeking clarification or more information
#2538 opened Jan 2, 2025 by anakin87
Different finetune speed in DPO task of peft and ms-swift (600/S iter vs 30/s iter) 🏋 DPO Related to DPO 🙋 help from community wanted Open invitation for community members to contribute ⚡ PEFT Related to PEFT
#2536 opened Jan 2, 2025 by maoulee
7 of 9 tasks
(Willing to PR) Will it be welcomed if speeding up algorithms like PPO and code refactor/cleanup? 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2535 opened Dec 31, 2024 by fzyzcjy
Using "beam search" strategy while generating the responses 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
#2534 opened Dec 31, 2024 by SachinVashisth
onlinedpo error when use deepspeed zero3 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed ⏳ needs more info Additional information or clarification is required to proceed 🏋 Online DPO Related to Online DPO
#2532 opened Dec 30, 2024 by yiyepiaoling0715
5 of 9 tasks
PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way 🐛 bug Something isn't working 🏋 PPO Related to PPO
#2530 opened Dec 29, 2024 by dawidm
6 of 9 tasks
Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO ✨ enhancement New feature or request 🏋 Online DPO Related to Online DPO 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2529 opened Dec 28, 2024 by dawidm
Direct Q-Function Optimization ✨ enhancement New feature or request
#2526 opened Dec 28, 2024 by catherinelee274
Integrate OREO into TRL and HF ✨ enhancement New feature or request
#2525 opened Dec 28, 2024 by August-murr
3 tasks done
[question] best way to have my own reward model which is backed by rules 🏋 PPO Related to PPO ❓ question Seeking clarification or more information
#2518 opened Dec 24, 2024 by yananchen1989
Soft Actor-Critic (SAC) Trainer ✨ enhancement New feature or request
#2517 opened Dec 23, 2024 by AMindToThink
3 tasks
RLOO trainer epochs/steps/episodes calculations seems not to be working properly 🐛 bug Something isn't working 🏋 RLOO Related to RLOO
#2515 opened Dec 23, 2024 by dawidm
7 of 9 tasks
Checkpointing is failing with SFTTrainer PEFT LoRA on DeepSpeed Zero-3 🐛 bug Something isn't working ⚡ PEFT Related to PEFT 🏋 SFT Related to SFT
#2514 opened Dec 21, 2024 by SwayamInSync
7 of 9 tasks
DDPO checkpoint ú· 🐛 bug Something isn't working 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute ⏳ needs more info Additional information or clarification is required to proceed
#2505 opened Dec 20, 2024 by nguyenhoa-uit
5 of 9 tasks
ProTip! What’s not been updated in a month: updated:<2024-12-14.