Release v0.9.6 release · huggingface/trl

We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:

Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input loss_type="simpo" and cpo_alpha=0 in the CPOConfig and use it with the CPOTrainer.

Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
Added Efficient Exact Optimization (EXO) by @haozheji

We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!

What's Changed

set dev version by @younesbelkada in #1710
Add a variant of CPO, SimPO by @fe1ixxu in #1703
[RPO] fix nll loss by @kashif in #1705
fix yaml parser for derived config classes by @mnoukhov in #1713
Fix default padding_value in dpo_config.py by @mnoukhov in #1692
feat(ci): add trufflehog secrets detection by @McPatate in #1721
ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in #1724
adds AOT by @imelnyk in #1701
Workflow: Notify tests results on slack channel by @younesbelkada in #1744
better trl parser with yaml config by @mnoukhov in #1739
CI / core: Pin numpy to !=2.0.0 for CI and to users by @younesbelkada in #1747
TrlParser: Add ignore extra args option by @younesbelkada in #1748
small KTO fixes by @kawine in #1734
CPO / DPO: Fix red CI by @younesbelkada in #1749
prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in #1728
CI / KTOTrainer: Remove old tests by @younesbelkada in #1750
change the process function in the example of DPO by @AIR-hl in #1753
Integrate f-divergence to DPO (Follow up) by @1485840691 in #1610
Support for returning past_key_values from the model by @idanshen in #1742
Fix masking of response tokens by @mertsayar8 in #1718
Support num_train_epochs by @vwxyzjn in #1743
Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in #1758
New sentiment and descriptiveness dataset by @vwxyzjn in #1757
Add CPO-SimPO method by @fe1ixxu in #1760
Added Reward Backpropogation Support by @mihirp1998 in #1585
MoE Models: option to add load balancing loss by @claralp in #1765
evaluation_strategy to eval_strategy by @qgallouedec in #1771
add Efficient Exact Optimization (EXO) by @haozheji in #1735
Remove the leading space in the tldr preference dataset by @vwxyzjn in #1773
Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in #1774
Visual DPO by @qgallouedec in #1647
[DOCS] fix docs and cli example script by @kashif in #1780
Fixed typo in SFT trainer docs by @detsutut in #1788
[SFT] add model_init_kwargs to training_args by @kashif in #1787
Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in #1794
Clean examples by @qgallouedec in #1791
Remove extra print in reward_trainer.py by @mnoukhov in #1799
Fix torch_dtype handling in {DPO,SFT}Trainer when provided via CLI by @alvarobartt in #1807
Fix TRL_USE_RICH environment variable handling by @alvarobartt in #1808
0.9.6 release by @vwxyzjn in #1816

New Contributors

@McPatate made their first contribution in #1721
@jetlime made their first contribution in #1724
@imelnyk made their first contribution in #1701
@AIR-hl made their first contribution in #1753
@1485840691 made their first contribution in #1610
@idanshen made their first contribution in #1742
@mertsayar8 made their first contribution in #1718
@scottsuk0306 made their first contribution in #1758
@mihirp1998 made their first contribution in #1585
@haozheji made their first contribution in #1735
@Mubin17 made their first contribution in #1774
@detsutut made their first contribution in #1788
@noahlt made their first contribution in #1794

Full Changelog: v0.9.4...v0.9.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.6 release

What's Changed

New Contributors

Contributors