Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean configs documentation #1944

Merged
merged 81 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
c2d9a62
Clean BCO
qgallouedec Aug 18, 2024
e3083f1
Optional[int]
qgallouedec Aug 18, 2024
c7b2fbc
fix sft config
qgallouedec Aug 19, 2024
e7a80bb
Merge branch 'main' into clean-config
qgallouedec Aug 19, 2024
50dbc86
alignprop config
qgallouedec Aug 20, 2024
b718fba
Merge branch 'main' into clean-config
qgallouedec Aug 20, 2024
4a8aba6
upadte tempfile to work with output_dir
qgallouedec Aug 20, 2024
6ae94e9
Merge branch 'clean-config' of https://github.com/huggingface/trl int…
qgallouedec Aug 20, 2024
3ed49fd
Merge branch 'main' into clean-config
qgallouedec Aug 21, 2024
f847f56
clean kto config
qgallouedec Aug 21, 2024
69525f9
intro docstring
qgallouedec Aug 21, 2024
c73f43a
style
qgallouedec Aug 21, 2024
11f6e7e
reward config
qgallouedec Aug 22, 2024
946e2e5
orpo config
qgallouedec Aug 22, 2024
21df122
Merge branch 'main' into clean-config
qgallouedec Aug 26, 2024
a1bff9c
warning in trainer, not in config
qgallouedec Aug 26, 2024
006a454
cpo config
qgallouedec Aug 26, 2024
c9264ee
Merge branch 'main' into clean-config
qgallouedec Aug 27, 2024
01d8814
ppo v2
qgallouedec Aug 27, 2024
5cd9eef
Merge branch 'clean-config' of https://github.com/huggingface/trl int…
qgallouedec Aug 27, 2024
9bef508
model config
qgallouedec Aug 27, 2024
0a49bca
ddpo and per_device_train_batch_size (instead of (train_batch_size)
qgallouedec Aug 27, 2024
1c9bba7
Merge branch 'main' into clean-config
qgallouedec Aug 27, 2024
216856a
rloo
qgallouedec Aug 27, 2024
7270936
Online config
qgallouedec Aug 27, 2024
05bacaf
tmp_dir in test_ddpo
qgallouedec Aug 27, 2024
451b4fc
style
qgallouedec Aug 27, 2024
9e6f0a0
remove to_dict and fix post-init
qgallouedec Aug 28, 2024
2aa4544
batch size in test ddpo
qgallouedec Aug 28, 2024
97738c8
Merge branch 'main' into clean-config
qgallouedec Aug 28, 2024
098ca6a
Merge branch 'main' into clean-config
qgallouedec Aug 28, 2024
02b78ec
dpo
qgallouedec Aug 28, 2024
92ff078
style
qgallouedec Aug 28, 2024
63679fe
Merge branch 'main' into clean-config
qgallouedec Aug 29, 2024
4957a8c
`Args` -> `Parameters`
qgallouedec Aug 29, 2024
bd3693b
parameters
qgallouedec Aug 29, 2024
10468e9
ppo config
qgallouedec Aug 29, 2024
d289982
dont overwrite world size
qgallouedec Aug 29, 2024
d94985a
style
qgallouedec Aug 29, 2024
1bc063a
Merge branch 'main' into clean-config
qgallouedec Aug 29, 2024
00d2faf
outputdir in test ppo
qgallouedec Aug 29, 2024
aa98e42
output dir in ppo config
qgallouedec Aug 29, 2024
66dc235
Merge branch 'clean-config' of https://github.com/huggingface/trl int…
qgallouedec Aug 29, 2024
79234d1
revert non-core change (1/n)
qgallouedec Sep 3, 2024
9b3b3a7
revert non-core changes (2/n)
qgallouedec Sep 3, 2024
6aeba64
revert non-core change (3/n)
qgallouedec Sep 3, 2024
fc4d223
Merge branch 'main' into clean-config
qgallouedec Sep 3, 2024
23fbfc6
uniform max_length
qgallouedec Sep 3, 2024
136cfdc
fix uniform max_length
qgallouedec Sep 3, 2024
640999c
beta uniform
qgallouedec Sep 3, 2024
3d5618c
Merge branch 'clean-config' of https://github.com/huggingface/trl int…
qgallouedec Sep 3, 2024
cfe9b22
style
qgallouedec Sep 3, 2024
358b026
link to `ConstantLengthDataset`
qgallouedec Sep 3, 2024
2190bf1
uniform `dataset_num_proc`
qgallouedec Sep 3, 2024
5434969
uniform `disable_dropout`
qgallouedec Sep 3, 2024
a7e537a
`eval_packing` doc
qgallouedec Sep 3, 2024
1a86078
try latex and α in doc
qgallouedec Sep 3, 2024
7065562
try title first
qgallouedec Sep 3, 2024
2d93d3d
doesn't work
qgallouedec Sep 3, 2024
42acd10
reorganize doc
qgallouedec Sep 3, 2024
92a2206
overview
qgallouedec Sep 3, 2024
81d5147
better latex
qgallouedec Sep 3, 2024
71c110a
is_encoder_decoder uniform
qgallouedec Sep 3, 2024
e60c3b0
proper ticks
qgallouedec Sep 3, 2024
a964090
fix latex
qgallouedec Sep 3, 2024
45d4f99
uniform generate_during_eval
qgallouedec Sep 3, 2024
3bc2d30
uniform truncation_mode
qgallouedec Sep 3, 2024
66a4861
ref_model_mixup_alpha
qgallouedec Sep 3, 2024
e2d8f7f
ref_model_mixup_alpha and ref_model_sync_steps
qgallouedec Sep 3, 2024
79347d9
Uniform `model_init_kwargs` and `ref_model_init_kwargs`
qgallouedec Sep 3, 2024
9ba37a9
rpo_alpha
qgallouedec Sep 3, 2024
52f69b1
Update maximum length argument names in config files
qgallouedec Sep 3, 2024
0fabc42
Update loss_type descriptions in config files
qgallouedec Sep 3, 2024
e1abc3a
Update max_target_length to max_completion_length in CPOConfig and CP…
qgallouedec Sep 3, 2024
d618f0c
Update padding value in config files
qgallouedec Sep 3, 2024
594677c
Update precompute_ref_log_probs flag documentation
qgallouedec Sep 3, 2024
5dee9ab
Fix typos and update comments in dpo_config.py and sft_config.py
qgallouedec Sep 3, 2024
47431f8
Merge branch 'main' into clean-config
qgallouedec Sep 4, 2024
19af1fa
post init warning for `max_target_length`
qgallouedec Sep 4, 2024
34b38b0
Merge branch 'clean-config' of https://github.com/huggingface/trl int…
qgallouedec Sep 4, 2024
07c9cab
Merge branch 'main' into clean-config
qgallouedec Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
revert non-core change (1/n)
  • Loading branch information
qgallouedec committed Sep 3, 2024
commit 79234d1fe145124bdd1e9ff6dad9575f7cb1185d
19 changes: 9 additions & 10 deletions docs/source/customization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ ref_model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# 2. define config
ppo_config = {'output_dir': 'output_dir', 'batch_size': 1, 'learning_rate':1e-5}
ppo_config = {'batch_size': 1, 'learning_rate':1e-5}
config = PPOConfig(**ppo_config)


Expand All @@ -87,7 +87,7 @@ ref_model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# 2. define config
ppo_config = {'output_dir': 'output_dir', 'batch_size': 1, 'learning_rate':1e-5}
ppo_config = {'batch_size': 1, 'learning_rate':1e-5}
config = PPOConfig(**ppo_config)


Expand Down Expand Up @@ -128,7 +128,7 @@ ref_model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# 2. define config
ppo_config = {'output_dir': 'output_dir', 'batch_size': 1, 'learning_rate':1e-5}
ppo_config = {'batch_size': 1, 'learning_rate':1e-5}
config = PPOConfig(**ppo_config)


Expand All @@ -154,7 +154,7 @@ ref_model = create_reference_model(model, num_shared_layers=6)
tokenizer = AutoTokenizer.from_pretrained('bigscience/bloom-560m')

# 2. initialize trainer
ppo_config = {'output_dir': 'output_dir', 'batch_size': 1}
ppo_config = {'batch_size': 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)
```
Expand Down Expand Up @@ -182,7 +182,7 @@ ref_model = AutoModelForCausalLMWithValueHead.from_pretrained('bigscience/bloom-
tokenizer = AutoTokenizer.from_pretrained('bigscience/bloom-560m')

# 2. initialize trainer
ppo_config = {'output_dir': 'output_dir', 'batch_size': 1}
ppo_config = {'batch_size': 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)
```
Expand All @@ -203,15 +203,14 @@ As suggested by [Secrets of RLHF in Large Language Models Part I: PPO](https://h
from trl import PPOConfig

ppo_config = {
'output_dir': 'output_dir',
'use_score_scaling': True,
'use_score_norm': True,
'score_clip': 0.5,
use_score_scaling=True,
use_score_norm=True,
score_clip=0.5,
}
config = PPOConfig(**ppo_config)
```

To run `ppo.py`, you can use the following command:
```
python examples/scripts/ppo.py --output_dir output_dir --log_with wandb --use_score_scaling --use_score_norm --score_clip 0.5
python examples/scripts/ppo.py --log_with wandb --use_score_scaling --use_score_norm --score_clip 0.5
```
4 changes: 2 additions & 2 deletions docs/source/ddpo_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ To obtain the documentation of `stable_diffusion_tuning.py`, please run `python

The following are things to keep in mind (The code checks this for you as well) in general while configuring the trainer (beyond the use case of using the example script)

- The configurable sample batch size (`--ddpo_config.sample_batch_size=6`) should be greater than or equal to the configurable training batch size (`--ddpo_config.per_device_train_batch_size=3`)
- The configurable sample batch size (`--ddpo_config.sample_batch_size=6`) must be divisible by the configurable train batch size (`--ddpo_config.per_device_train_batch_size=3`)
- The configurable sample batch size (`--ddpo_config.sample_batch_size=6`) should be greater than or equal to the configurable training batch size (`--ddpo_config.train_batch_size=3`)
- The configurable sample batch size (`--ddpo_config.sample_batch_size=6`) must be divisible by the configurable train batch size (`--ddpo_config.train_batch_size=3`)
- The configurable sample batch size (`--ddpo_config.sample_batch_size=6`) must be divisible by both the configurable gradient accumulation steps (`--ddpo_config.train_gradient_accumulation_steps=1`) and the configurable accelerator processes count

## Setting up the image logging hook function
Expand Down
1 change: 0 additions & 1 deletion docs/source/logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ By default, the TRL [`PPOTrainer`] saves a lot of relevant information to `wandb
Upon initialization, pass one of these two options to the [`PPOConfig`]:
```
config = PPOConfig(
output_dir="output_dir",
model_name=args.model_name,
log_with=`wandb`, # or `tensorboard`
)
Expand Down
1 change: 0 additions & 1 deletion docs/source/ppo_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ The `PPOConfig` dataclass controls all the hyperparameters and settings for the
from trl import PPOConfig

config = PPOConfig(
output_dir="output_dir",
model_name="gpt2",
learning_rate=1.41e-5,
)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# 2. initialize trainer
ppo_config = {"output_dir": "output_dir", "mini_batch_size": 1, "batch_size": 1}
ppo_config = {"mini_batch_size": 1, "batch_size": 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)

Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/gpt2-sentiment-control.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
"sentiment_pipe_kwargs = {\"top_k\": None, \"function_to_apply\": \"none\"}\n",
"\n",
"config = PPOConfig(\n",
" output_dir =\"output_dir\", model_name=\"lvwerra/gpt2-imdb\", steps=51200, learning_rate=1.41e-5, remove_unused_columns=False, log_with=\"wandb\"\n",
" model_name=\"lvwerra/gpt2-imdb\", steps=51200, learning_rate=1.41e-5, remove_unused_columns=False, log_with=\"wandb\"\n",
")\n",
"\n",
"txt_in_len = 5\n",
Expand Down
1 change: 0 additions & 1 deletion examples/notebooks/gpt2-sentiment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@
"outputs": [],
"source": [
"config = PPOConfig(\n",
" output_dir=\"output_dir\", \n",
" model_name=\"lvwerra/gpt2-imdb\",\n",
" learning_rate=1.41e-5,\n",
" log_with=\"wandb\",\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/scripts/alignprop.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
--num_epochs=20 \
--train_gradient_accumulation_steps=4 \
--sample_num_steps=50 \
--per_device_train_batch_size=8 \
--train_batch_size=8 \
--tracker_project_name="stable_diffusion_training" \
--log_with="wandb"

Expand Down
2 changes: 1 addition & 1 deletion examples/scripts/ddpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
--train_gradient_accumulation_steps=1 \
--sample_num_steps=50 \
--sample_batch_size=6 \
--per_device_train_batch_size=3 \
--train_batch_size=3 \
--sample_num_batches_per_epoch=4 \
--per_prompt_stat_tracking=True \
--per_prompt_stat_tracking_buffer_size=32 \
Expand Down
84 changes: 41 additions & 43 deletions tests/test_alignprop_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import tempfile
import unittest

import torch
Expand Down Expand Up @@ -41,53 +40,52 @@ class AlignPropTrainerTester(unittest.TestCase):
Test the AlignPropTrainer class.
"""

def setUp(self):
alignprop_config = AlignPropConfig(
num_epochs=2,
train_gradient_accumulation_steps=1,
train_batch_size=2,
truncated_backprop_rand=False,
mixed_precision=None,
save_freq=1000000,
)
pretrained_model = "hf-internal-testing/tiny-stable-diffusion-torch"
pretrained_revision = "main"
pipeline_with_lora = DefaultDDPOStableDiffusionPipeline(
pretrained_model, pretrained_model_revision=pretrained_revision, use_lora=True
)
pipeline_without_lora = DefaultDDPOStableDiffusionPipeline(
pretrained_model, pretrained_model_revision=pretrained_revision, use_lora=False
)
self.trainer_with_lora = AlignPropTrainer(
alignprop_config, scorer_function, prompt_function, pipeline_with_lora
)
self.trainer_without_lora = AlignPropTrainer(
alignprop_config, scorer_function, prompt_function, pipeline_without_lora
)

def tearDown(self) -> None:
gc.collect()

@parameterized.expand([True, False])
def test_generate_samples(self, use_lora):
with tempfile.TemporaryDirectory() as tmp_dir:
alignprop_config = AlignPropConfig(
output_dir=tmp_dir,
num_epochs=2,
train_gradient_accumulation_steps=1,
per_device_train_batch_size=2,
truncated_backprop_rand=False,
mixed_precision=None,
save_freq=1000000,
)
pretrained_model = "hf-internal-testing/tiny-stable-diffusion-torch"
pipeline = DefaultDDPOStableDiffusionPipeline(pretrained_model, use_lora=use_lora)
trainer = AlignPropTrainer(alignprop_config, scorer_function, prompt_function, pipeline)
output_pairs = trainer._generate_samples(2, with_grad=True)
assert len(output_pairs.keys()) == 3
assert len(output_pairs["images"]) == 2
trainer = self.trainer_with_lora if use_lora else self.trainer_without_lora
output_pairs = trainer._generate_samples(2, with_grad=True)
assert len(output_pairs.keys()) == 3
assert len(output_pairs["images"]) == 2

@parameterized.expand([True, False])
def test_calculate_loss(self, use_lora):
with tempfile.TemporaryDirectory() as tmp_dir:
alignprop_config = AlignPropConfig(
output_dir=tmp_dir,
num_epochs=2,
train_gradient_accumulation_steps=1,
per_device_train_batch_size=2,
truncated_backprop_rand=False,
mixed_precision=None,
save_freq=1000000,
)
pretrained_model = "hf-internal-testing/tiny-stable-diffusion-torch"
pipeline = DefaultDDPOStableDiffusionPipeline(pretrained_model, use_lora=use_lora)
trainer = AlignPropTrainer(alignprop_config, scorer_function, prompt_function, pipeline)

sample = trainer._generate_samples(2)

images = sample["images"]
prompts = sample["prompts"]

assert images.shape == (2, 3, 128, 128)
assert len(prompts) == 2

rewards = trainer.compute_rewards(sample)
loss = trainer.calculate_loss(rewards)

assert torch.isfinite(loss.cpu())
trainer = self.trainer_with_lora if use_lora else self.trainer_without_lora
sample = trainer._generate_samples(2)

images = sample["images"]
prompts = sample["prompts"]

assert images.shape == (2, 3, 128, 128)
assert len(prompts) == 2

rewards = trainer.compute_rewards(sample)
loss = trainer.calculate_loss(rewards)

assert torch.isfinite(loss.cpu())
Loading