-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug in the document] dataset format for RewardTrainer #2164
Comments
here i test (i do not change anything in reward_modeling.py which is directly cloned from trl repo)
|
trl-lib/ultrafeedback_binarized is implicit prompt since you don't have a prompt column. You can see that there is a common start ( >>> from daataset import load_dataset
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'daataset'
>>> from datasets import load_dataset
>>> dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
>>> dataset.column_names
['chosen', 'rejected', 'score_chosen', 'score_rejected']
>>> dataset[0]
{'chosen': [{'content': 'Use the pygame library to write a version of the classic game Snake, with a unique twist', 'role': 'user'}, {'content': "Sure, I'd be happy to help you write a version of the classic game Snake using the pygame library! ...", 'role': 'assistant'}],
'rejected': [{'content': 'Use the pygame library to write a version of the classic game Snake, with a unique twist', 'role': 'user'}, {'content': 'Sure, here\'s an example of how to write a version of Snake game with a unique twist using the Pygame library:...', 'role': 'assistant'}], 'score_chosen': 6.0, 'score_rejected': 4.0} |
The provided code works fine on my side:
If the error persists, please provide your full system info (see bug issue template) |
The reward trainer data support has been recently updated (#2102) . see the latest version of the doc for more info: https://huggingface.co/docs/trl/main/en/reward_trainer |
i see. got it. |
error:
by the way,
|
python version: 3.11.10 |
I've downgraded to v0.11.1 and I still can't reproduce the error.
Can you also confirm that you have not modified the codebase? |
|
i git pull in path therefore everything is updated, including |
i install trl via |
I still can't reproduce, I tried to reinstall everything, but it still works. Can you try the same? Also, try clearing your cache. python3.11 -m venv env
source env/bin/activate
pip install trl[peft]==0.11.1
curl -O https://raw.githubusercontent.com/huggingface/trl/86ad7a7e85dc65c79bd9759097709a27ad1a58dd/examples/scripts/reward_modeling.py
python reward_modeling.py \
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
--dataset_name trl-lib/ultrafeedback_binarized \
--output_dir Qwen2-0.5B-Reward-LoRA \
--per_device_train_batch_size 8 \
--num_train_epochs 1 \
--gradient_checkpointing True \
--learning_rate 1.0e-4 \
--logging_steps 25 \
--eval_strategy steps \
--eval_steps 50 \
--max_length 2048 \
--use_peft \
--lora_r 32 \
--lora_alpha 16
|
reward_modeling.py from your https://raw.githubusercontent.com/huggingface/trl/86ad7a7e85dc65c79bd9759097709a27ad1a58dd/examples/scripts/reward_modeling.py does work fine. but the script from https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py does not work. I do see there is a lot of difference between them |
The later is the script for the dev version. You can't use trl 0.11 with it |
ok, then i will wait for the dev version to be released. thanks. @qgallouedec |
hi. just reopen this ticket. although this dataset is used as an example in https://huggingface.co/docs/trl/v0.11.2/reward_trainer error message:
|
so only chat-format preference dataset like |
No, the following works fine:
|
@qgallouedec did you checkout the branch of |
you checkout the branch and |
Indeed in v0.11.2, the example assumes that the dataset is in conversational format. |
ok, so plain-text format such as |
False. Previously it was not supported, now it is. dev is ahead of v0.11.2 |
ok, i will wait the new release and test it in the near future. |
@qgallouedec can i use data format like openbookqa,one prompt has 4-9 response like paper in InstructGPT,thanks |
System Info
trl version > v0.11
Information
Tasks
examples
folderReproduction
In the official document https://huggingface.co/docs/trl/main/en/reward_trainer ,
The [RewardTrainer] requires a [implicit prompt preference dataset]
.however, in the code script example: https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py#L18
I see the example is using
trl-lib/ultrafeedback_binarized
which is not so-called "implicit prompt preference datase" as the prompt is explicitly provided in the dataset.Could you look into this conflict ?
thanks.
Expected behavior
code and document alignment.
The text was updated successfully, but these errors were encountered: