Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

KTOTrainer should work when actual batch size==1 ✨ enhancement New feature or request 🏋 KTO Related to KTO
#2554 opened Jan 10, 2025 by starmpcc
Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO ✨ enhancement New feature or request 🏋 Online DPO Related to Online DPO 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2529 opened Dec 28, 2024 by dawidm
Direct Q-Function Optimization ✨ enhancement New feature or request
#2526 opened Dec 28, 2024 by catherinelee274
Integrate OREO into TRL and HF ✨ enhancement New feature or request
#2525 opened Dec 28, 2024 by August-murr
3 tasks done
Soft Actor-Critic (SAC) Trainer ✨ enhancement New feature or request
#2517 opened Dec 23, 2024 by AMindToThink
3 tasks
Spectrum training support ✨ enhancement New feature or request 🏋 SFT Related to SFT
#2504 opened Dec 19, 2024 by ggbetz
[Tracking issue] Integrate native liger-kernel losses ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity
#2495 opened Dec 17, 2024 by qgallouedec
5 tasks
Packing in DPOTrainer 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2469 opened Dec 13, 2024 by zhc7
Probably a more reasonable method of packing ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 SFT Related to SFT
#2466 opened Dec 12, 2024 by AIR-hl
Add gen_text Argument for Custom Text Generation During Fine-tuning ✨ enhancement New feature or request ⏳ needs more info Additional information or clarification is required to proceed
#2415 opened Nov 29, 2024 by dame-cell
RLOO Trainer do not support peft lora ✨ enhancement New feature or request 🙋 help from community wanted Open invitation for community members to contribute ⚡ PEFT Related to PEFT 🏋 RLOO Related to RLOO
#2404 opened Nov 28, 2024 by harvinyou
7 of 9 tasks
eos_token config in PPOTrainer ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
#2387 opened Nov 23, 2024 by kechunFIVE
2
2
adding DRO trainer ✨ enhancement New feature or request 🙋 help from community wanted Open invitation for community members to contribute
#2383 opened Nov 22, 2024 by morLev
2 of 3 tasks
DPO Training DataLoader is not shuffled 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2337 opened Nov 7, 2024 by kaiwenw
4 tasks
Using a different ref_model from model leads to incorrect results ✨ enhancement New feature or request ❓ question Seeking clarification or more information
#2307 opened Nov 1, 2024 by DarshanDeshpande
2 of 4 tasks
Helper function for getting reward model and judge ✨ enhancement New feature or request
#2271 opened Oct 24, 2024 by qgallouedec
Add VAS to TRL ✨ enhancement New feature or request
#2195 opened Oct 7, 2024 by idanshen Loading…
[CGPO] CGPO Trainer (single task single objective) ✨ enhancement New feature or request
#2190 opened Oct 6, 2024 by gaetanlop Draft
9 of 10 tasks
[Reward Modelling] Add support for process / stepwise supervision ✨ enhancement New feature or request 🙋 help from community wanted Open invitation for community members to contribute 🏋 Reward Related to Reward modelling
#2110 opened Sep 24, 2024 by lewtun
ProTip! Find all open issues with in progress development work with linked:pr.