generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
20 / 2620 of 26 issues completedLabels
👶 good first issueGood for newcomersGood for newcomers🧒 good second issueGood for contributors with basic project familiarityGood for contributors with basic project familiarity
Description
The purpose of this issue is to list the tasks that need to be completed before we reach v1. This list is evolving and is modified based on recent discussions and progress.
Documentation
- Remove
how_to_train.mdRemove how_to_train.md: outdated training FAQ #4267 - Remove
using_llama_models.mdRemove using_llama_models.md: outdated Llama2-specific documentation #4268 - Remove
logging.mdRemove logging.md: trainer-specific metrics documentation #4269 - Rewrite
peft_integration.md#4376 - Remove guidance about converting conversational to standard. #4375
- Move every section of
conceptual_guides/experimentalinto its own section inexperimental#4377 - Extend basic usage example to all supported CLIs #4378
- Remove or populate "Training customization" #4379
- Remove outdated warning about batch contamination #4381
- Populate "Speeding Up Training" #4382
- Add PEFT subsection to "Reducing Memory Usage" #4383
- Write the subsection "Multi-Node Training" #4384
- Use a common 'trl-lib` namespace for the models/datasets/spaces #4385
- Reference supported trainers in Liger Kernel integration guide #4386
- Remove Sentiment Tuning Examples #4396
- Remove or move Multi Adapter RL #4397
- Complete paper index #4407
Examples
Tests
Main codebase
- Add accuracy reward to the
trl.rewardsmodule Add accuracy reward #4270 - Add an option (default to True) to use
RichProgressCallbackin scripts (trl.scripts). - Add kernels to Docker images #4398
- Remove
log_example_reports.pyRemove unused log_example_reports.py script #4241 - Remove
commandsdirectory Remove unused commands directory #4258 - Remove
examples/research_projectsRemove unused commands directory #4258 - Remove
trl.extra.dataset_formattingDeprecate unused dataset_formatting module #4242 - Remove support for FSDP1 #4387
- Remove
BestOfNSampler. DeprecateBestOfNSampler#4291 - Fully transition from
flash-attntokernels#4380 - Move
masked_mean,masked_varandmasked_whitentoppo.py#4403 - Refactor DPO to align implementation with SFT (WIP in [DRAFT] Refactor DPO #3906)
- Tool calling for GRPO/RLOO (WIP in 🕵️♂️ Agent training #4300)
-
Async generation for Online methods - Bump transformers to v5
- Make vLLM server OpenAI-compatible #4402
Moving experimental features to experimental submodule
Discussed in #4223 for trainers
- Move BCO to experimental submodule 🚚 Move BCO to
trl.experimental#4312 - Move KTO to experimental submodule
- Move Nash-MD to experimental submodule Move NashMDTrainer to experimental module #4477
- Move ORPO to experimental submodule [ORPO] Move ORPOTrainer to experimental #4480
- Move Online DPO to experimental submodule #4472
- Move PPO to experimental submodule Move PPOTrainer to trl.experimental.ppo #4482
- Move PRM to experimental submodule Move PRMTrainer to trl.experimental.prm #4483
- Move RLOO to experimental submodule
- Move XPO to experimental submodule Move XPOTrainer to trl.experimental.xpo #4485
- Move everything related to Mergekit to the experimental submodule #4395
- Move everything related to Judges to
trl.experimental#4400 - Move Winrate callback to experimental Move
WinRateCallbackto experimental #4558
sergiopaniego, maziyarpanahi, kashif and ucyang
Sub-issues
Metadata
Metadata
Assignees
Labels
👶 good first issueGood for newcomersGood for newcomers🧒 good second issueGood for contributors with basic project familiarityGood for contributors with basic project familiarity