Deprecate `PPOTrainer` #2016

qgallouedec · 2024-09-04T19:40:28Z

# v0.11 <- wer're here
from trl import PPOTrainer  # Old implementation, raise DeprecationWarning("PPOTrainer is deprecated and will be removed in a future release. Please use PPOv2Trainer instead.")
from trl import PPOv2Trainer # New implementation

# v0.12
from trl import PPOTrainer # New implementation
from trl import PPOv2Trainer # New implementation, raise DeprecationWarning("PPOv2Trainer is deprecated and has been renamed to PPOTrainer. Please use PPOTrainer instead.")

# v0.13
from trl import PPOTrainer # New implementation
from trl import PPOv2Trainer # ImportError("PPOv2Trainer has been renamed to PPOTrainer. Please use PPOTrainer instead.")

Deprecate PPOTrainer
Deprecate PPOConfig
Update the doc (replace all examples containing PPOTrainer) -> will do in another PR

HuggingFaceDocBuilderDev · 2024-09-04T19:46:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/ppo_config.py

Sino-Huang · 2024-10-07T03:01:35Z

I would like to ask why original PPO script used AutoModelForCausalLMWithValueHead class to obtain policy model but now PPOv2 script uses AutoModelForCausalLM class. Are they interchangeable? I observed that if I continue using AutoModelForCausalLMWithValueHead class in PPOv2, I am able to have better training speed if I have LoRA config.

Also LoRA + PPOv2 example seems missing.

qgallouedec added 2 commits September 4, 2024 19:16

Promote PPOv2Trainer and PPOv2Config to top-level import

96ae02a

Deprecate PPOTrainer and PPOConfig

65990de

qgallouedec marked this pull request as draft September 4, 2024 19:40

FutureWarning

d1cd232

qgallouedec commented Sep 10, 2024

View reviewed changes

trl/trainer/ppo_config.py Outdated Show resolved Hide resolved

qgallouedec added 2 commits September 10, 2024 16:13

Update trl/trainer/ppo_config.py

f7e4645

Merge branch 'main' into deprecate-ppo

9d344e0

qgallouedec requested review from kashif and lewtun September 10, 2024 14:14

qgallouedec marked this pull request as ready for review September 10, 2024 14:14

kashif approved these changes Sep 10, 2024

View reviewed changes

qgallouedec merged commit a20e822 into main Sep 10, 2024
10 checks passed

qgallouedec deleted the deprecate-ppo branch September 10, 2024 17:04

qgallouedec mentioned this pull request Oct 4, 2024

🕊️ Migration PPOv2 -> PPO #2174

Merged

5 tasks

qgallouedec mentioned this pull request Oct 7, 2024

How to Save the PPOTrainer? #1934

Closed

qgallouedec mentioned this pull request Nov 1, 2024

⚰️ Remove deprecated args, script arguments, and PPOv2 #2306

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate `PPOTrainer` #2016

Deprecate `PPOTrainer` #2016

qgallouedec commented Sep 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 4, 2024

Sino-Huang commented Oct 7, 2024

Deprecate PPOTrainer #2016

Deprecate PPOTrainer #2016

Conversation

qgallouedec commented Sep 4, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Sep 4, 2024

Sino-Huang commented Oct 7, 2024

Deprecate `PPOTrainer` #2016

Deprecate `PPOTrainer` #2016

qgallouedec commented Sep 4, 2024 •

edited

Loading