v0.9.6 release
We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:
- Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input
loss_type="simpo"
andcpo_alpha=0
in theCPOConfig
and use it with theCPOTrainer
.
- Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
- Added Efficient Exact Optimization (EXO) by @haozheji
We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!
What's Changed
- set dev version by @younesbelkada in #1710
- Add a variant of CPO, SimPO by @fe1ixxu in #1703
- [RPO] fix nll loss by @kashif in #1705
- fix yaml parser for derived config classes by @mnoukhov in #1713
- Fix default padding_value in dpo_config.py by @mnoukhov in #1692
- feat(ci): add trufflehog secrets detection by @McPatate in #1721
- ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in #1724
- adds AOT by @imelnyk in #1701
- Workflow: Notify tests results on slack channel by @younesbelkada in #1744
- better trl parser with yaml config by @mnoukhov in #1739
- CI / core: Pin
numpy
to!=2.0.0
for CI and to users by @younesbelkada in #1747 TrlParser
: Add ignore extra args option by @younesbelkada in #1748- small KTO fixes by @kawine in #1734
- CPO / DPO: Fix red CI by @younesbelkada in #1749
- prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in #1728
- CI /
KTOTrainer
: Remove old tests by @younesbelkada in #1750 - change the
process
function in the example of DPO by @AIR-hl in #1753 - Integrate f-divergence to DPO (Follow up) by @1485840691 in #1610
- Support for returning past_key_values from the model by @idanshen in #1742
- Fix masking of response tokens by @mertsayar8 in #1718
- Support num_train_epochs by @vwxyzjn in #1743
- Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in #1758
- New sentiment and descriptiveness dataset by @vwxyzjn in #1757
- Add CPO-SimPO method by @fe1ixxu in #1760
- Added Reward Backpropogation Support by @mihirp1998 in #1585
- MoE Models: option to add load balancing loss by @claralp in #1765
evaluation_strategy
toeval_strategy
by @qgallouedec in #1771- add Efficient Exact Optimization (EXO) by @haozheji in #1735
- Remove the leading space in the tldr preference dataset by @vwxyzjn in #1773
- Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in #1774
- Visual DPO by @qgallouedec in #1647
- [DOCS] fix docs and cli example script by @kashif in #1780
- Fixed typo in SFT trainer docs by @detsutut in #1788
- [SFT] add model_init_kwargs to training_args by @kashif in #1787
- Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in #1794
- Clean examples by @qgallouedec in #1791
- Remove extra print in reward_trainer.py by @mnoukhov in #1799
- Fix
torch_dtype
handling in{DPO,SFT}Trainer
when provided via CLI by @alvarobartt in #1807 - Fix
TRL_USE_RICH
environment variable handling by @alvarobartt in #1808 - 0.9.6 release by @vwxyzjn in #1816
New Contributors
- @McPatate made their first contribution in #1721
- @jetlime made their first contribution in #1724
- @imelnyk made their first contribution in #1701
- @AIR-hl made their first contribution in #1753
- @1485840691 made their first contribution in #1610
- @idanshen made their first contribution in #1742
- @mertsayar8 made their first contribution in #1718
- @scottsuk0306 made their first contribution in #1758
- @mihirp1998 made their first contribution in #1585
- @haozheji made their first contribution in #1735
- @Mubin17 made their first contribution in #1774
- @detsutut made their first contribution in #1788
- @noahlt made their first contribution in #1794
Full Changelog: v0.9.4...v0.9.6