Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐾 Process-supervised RM Trainer #2127

Merged
merged 140 commits into from
Dec 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
357a8c6
initial skeleton
gaetanlop Sep 26, 2024
841f7a1
tokenize fn
gaetanlop Sep 26, 2024
641e899
adding bos and eos to tokenization fn
gaetanlop Sep 26, 2024
106bc0e
prmtrainer
gaetanlop Sep 27, 2024
0163dcc
fixing small typo in tokenize
gaetanlop Sep 27, 2024
c2720d7
typo in input_ids and labels construction
gaetanlop Sep 27, 2024
5034083
numpy dimension
gaetanlop Sep 27, 2024
8818b6a
introduce the stepwise reward trainer
gaetanlop Sep 28, 2024
b777d1c
update markdown files
gaetanlop Sep 28, 2024
afa9e0a
let user decide post step separator in config
gaetanlop Sep 28, 2024
2dd752d
doc post_step_separator
gaetanlop Sep 28, 2024
613d838
do not add post step_tokens to last step of the reasoning process
gaetanlop Sep 28, 2024
b96ef4d
renaming prm to stepwisereward
gaetanlop Sep 28, 2024
161f5de
formatting
gaetanlop Sep 28, 2024
93e6652
fix tokenize kwargs
gaetanlop Sep 28, 2024
3ec4ebe
adapt test to the new post_token args
gaetanlop Sep 28, 2024
1461a61
adding example script
gaetanlop Sep 28, 2024
8c4ac31
fix small typo
gaetanlop Sep 28, 2024
8b3fa52
add create_model_card and renaming
gaetanlop Oct 1, 2024
8e4e159
fixing booleans
gaetanlop Oct 1, 2024
c60bc40
Adding the new stepwise_preference instead of placeholders for datasets
gaetanlop Oct 1, 2024
614fb4e
formatting
gaetanlop Oct 1, 2024
c582464
Merge branch 'main' into prmtrainer
qgallouedec Oct 1, 2024
424af34
Merge branch 'main' into prmtrainer
kashif Oct 8, 2024
b00e32b
Update docs/source/_toctree.yml
gaetanlop Oct 12, 2024
d5f780a
Update examples/scripts/stepwise_reward_modeling.py
gaetanlop Oct 12, 2024
f02056a
Update trl/trainer/stepwise_reward_trainer.py
gaetanlop Oct 12, 2024
3ac323f
Update trl/trainer/stepwise_reward_trainer.py
gaetanlop Oct 12, 2024
436dfd7
update push to hub
gaetanlop Oct 12, 2024
f4e6d4e
step_separator can't be None
gaetanlop Oct 12, 2024
6947aef
Merge branch 'main' into prmtrainer
gaetanlop Oct 12, 2024
e0c0648
fix suggested typos
gaetanlop Oct 12, 2024
35de0ee
add citation
gaetanlop Oct 12, 2024
c3eb08e
reformat doc
gaetanlop Oct 12, 2024
898f621
reordering init
gaetanlop Oct 13, 2024
3a488e0
push to hub prm800k
gaetanlop Oct 13, 2024
a03aed8
changing dataset in example
gaetanlop Oct 13, 2024
e77eee2
change dataset format to align with the sky is blue example
gaetanlop Oct 13, 2024
6c62c69
Merge branch 'main' into prmtrainer
gaetanlop Oct 13, 2024
e8e93f1
fix tokenization column names
gaetanlop Oct 13, 2024
2059c51
fix num labels in openai example
gaetanlop Oct 13, 2024
701241b
add support for conversational dataset
gaetanlop Oct 13, 2024
6bb467b
remove training whitespace
gaetanlop Oct 13, 2024
6b2bd97
Merge branch 'main' into prmtrainer
gaetanlop Oct 14, 2024
2030a83
replace tokenizer with processing class
gaetanlop Oct 14, 2024
66baada
Merge branch 'prmtrainer' of https://github.com/gaetanlop/trl into pr…
gaetanlop Oct 14, 2024
b47eea5
Merge branch 'main' into prmtrainer
qgallouedec Nov 18, 2024
9b1693d
Merge branch 'main' into prmtrainer
gaetanlop Nov 24, 2024
086ea8f
Update docs/source/dataset_formats.mdx
gaetanlop Nov 24, 2024
fe440de
remove openai_prm800k
gaetanlop Nov 24, 2024
468502b
Update trl/trainer/stepwise_reward_trainer.py
gaetanlop Nov 24, 2024
d205064
Update trl/trainer/stepwise_reward_trainer.py
gaetanlop Nov 24, 2024
6128a7f
Merge branch 'prmtrainer' of https://github.com/gaetanlop/trl into pr…
gaetanlop Nov 24, 2024
faf1051
Update docs/source/stepwise_reward_trainer.mdx
gaetanlop Nov 24, 2024
dfe7e04
Update docs/source/stepwise_reward_trainer.mdx
gaetanlop Nov 24, 2024
fc702be
renaming
gaetanlop Nov 24, 2024
a65e30c
renaming
gaetanlop Nov 24, 2024
d53ad35
minor renamings in docs
gaetanlop Nov 24, 2024
24d2f1a
using prm800k instead of openai_prm800k
gaetanlop Nov 24, 2024
4fd282e
update num labels to 2 following the new format
gaetanlop Nov 24, 2024
2c9d2f3
changing doc examples to math examples
gaetanlop Nov 24, 2024
91a3de8
change reference to dataset_formats.mdx
gaetanlop Nov 24, 2024
97ef925
changing dataset config in test
gaetanlop Nov 24, 2024
754ba44
remove conversational dataset support
gaetanlop Nov 25, 2024
a7bac4e
remove conv dataset support
gaetanlop Nov 25, 2024
916f87e
fix bos token
gaetanlop Nov 25, 2024
364d7d8
fix scriptarguments in example
gaetanlop Nov 25, 2024
5a6970d
completion to completions
gaetanlop Nov 25, 2024
e445bad
remove valuerror for step_separator inside steps
gaetanlop Nov 25, 2024
fb15691
run precommit
gaetanlop Nov 25, 2024
1c76266
Merge branch 'main' into prmtrainer
gaetanlop Nov 25, 2024
9ae131a
Merge branch 'main' into prmtrainer
gaetanlop Nov 26, 2024
84c28fe
remove conv dataset support
gaetanlop Nov 26, 2024
16e4ef8
renaming zen dataset
gaetanlop Nov 26, 2024
147c375
remove unused printing
gaetanlop Nov 26, 2024
e310b0e
unknown label column
gaetanlop Nov 26, 2024
59f1e9f
introduce the train on last step arg
gaetanlop Nov 26, 2024
b057cf7
_tokenize support train_on_last_step
gaetanlop Nov 26, 2024
3a034d0
incorporate train_on_last_step to tests
gaetanlop Nov 26, 2024
8dce558
formatting
gaetanlop Nov 26, 2024
69adb5c
remove comments in trainer
gaetanlop Nov 26, 2024
be6e843
Refactor `tokenize_row`
qgallouedec Nov 26, 2024
e8c782d
Update max_completion_length parameter in StepwiseRewardConfig
qgallouedec Nov 26, 2024
4c83f41
Collator
qgallouedec Nov 26, 2024
a93138f
Update comment
qgallouedec Nov 26, 2024
072794a
Update type hint
qgallouedec Nov 26, 2024
5b10e38
fix table
qgallouedec Nov 26, 2024
5a8d0a2
Remove collator
qgallouedec Nov 26, 2024
f4ba54f
don't need pad token id
qgallouedec Nov 26, 2024
fd204d7
add error back
qgallouedec Nov 26, 2024
ebc8fb1
max length args
qgallouedec Nov 26, 2024
95a4a46
use tokenizer arg
qgallouedec Nov 26, 2024
46b6bd6
Update doc
qgallouedec Nov 26, 2024
201bdf2
label -> labels
qgallouedec Nov 26, 2024
4f28ed7
Merge pull request #1 from huggingface/prm-trainer-qgallouedec
gaetanlop Nov 27, 2024
0527531
Merge branch 'main' into prmtrainer
gaetanlop Nov 27, 2024
228aa31
fixing tokenization issues in tokenize row
gaetanlop Nov 27, 2024
aa33e62
correct labels for token classification
gaetanlop Nov 27, 2024
4cd0b79
adding max_length to tokenize_row
gaetanlop Nov 27, 2024
c58db4b
reformat tests
gaetanlop Nov 27, 2024
1385f46
adding tests for tokenize row
gaetanlop Nov 27, 2024
b2d45a8
fixing typos in comments
gaetanlop Nov 27, 2024
3d7d37d
update doc
gaetanlop Nov 28, 2024
ad3bd25
Add math_shepherd.py script for dataset processing
qgallouedec Nov 28, 2024
1cc6c8a
split the dataset
qgallouedec Nov 28, 2024
7273a3b
Merge pull request #2 from huggingface/prm-trainer-qgallouedec-2
gaetanlop Nov 29, 2024
b4e676b
Merge branch 'main' into prmtrainer
gaetanlop Nov 29, 2024
150500f
Merge branch 'main' into prmtrainer
qgallouedec Nov 29, 2024
30bb2c3
Merge branch 'main' into prmtrainer
gaetanlop Dec 1, 2024
32bb0b1
formatting
gaetanlop Dec 1, 2024
dec7bad
same evaluation method for the two training methods
gaetanlop Dec 2, 2024
e4fc400
adding filtering to example script
gaetanlop Dec 2, 2024
4ff8674
formatting
gaetanlop Dec 2, 2024
7787b98
Merge branch 'main' into prmtrainer
gaetanlop Dec 3, 2024
0d81c04
Merge branch 'main' into prmtrainer
qgallouedec Dec 9, 2024
049fdf9
Add features to avoid casting labels to bool in dataset tokenization
qgallouedec Dec 9, 2024
62b7465
Update docs/source/stepwise_reward_trainer.mdx [ci skip]
qgallouedec Dec 9, 2024
b62d74b
Add learning_rate parameter to StepwiseRewardConfig class
qgallouedec Dec 9, 2024
8d6a879
update doc
qgallouedec Dec 9, 2024
7da024c
Remove unused setup_chat_format function
qgallouedec Dec 9, 2024
c1f83ea
Fix warning message in stepwise_reward_modeling.py
qgallouedec Dec 9, 2024
a2d5837
Update logging steps in stepwise_reward_trainer.mdx
qgallouedec Dec 9, 2024
7146aff
little doc change [ci skip]
qgallouedec Dec 9, 2024
92be608
Merge branch 'main' into prmtrainer
qgallouedec Dec 10, 2024
ae677b1
Fix copyrights
qgallouedec Dec 10, 2024
7b88981
fix space after copyrights
qgallouedec Dec 10, 2024
c4faf19
Merge branch 'main' into prmtrainer
qgallouedec Dec 10, 2024
f164711
Update dataset loading in stepwise_reward_modeling.py
qgallouedec Dec 10, 2024
4572a21
refine compute_accuracy and proper test
qgallouedec Dec 10, 2024
75b50af
fix tests
qgallouedec Dec 10, 2024
2ebf9da
style
qgallouedec Dec 10, 2024
83e174e
Merge branch 'main' into prmtrainer
qgallouedec Dec 10, 2024
0d48cfa
Merge branch 'main' into prmtrainer
gaetanlop Dec 13, 2024
c4f6a62
renamings
gaetanlop Dec 13, 2024
81574f5
renaming in init
gaetanlop Dec 13, 2024
823825d
doc renaming
gaetanlop Dec 13, 2024
68e16f5
fix sorting and tag
qgallouedec Dec 13, 2024
9609ac8
experiemental [ci skip]
qgallouedec Dec 13, 2024
54011c9
trigger CI
qgallouedec Dec 13, 2024
686edfb
other doc fix
qgallouedec Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
trigger CI
  • Loading branch information
qgallouedec committed Dec 13, 2024
commit 54011c95d364ffc53529e86fdd6c810f5e65bebc

No changes to show.

This commit has no content.