Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] Big refactor of examples and documentation #509

Merged
merged 17 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
more refactor
  • Loading branch information
younesbelkada committed Jul 11, 2023
commit a443aa7fa1a068a3f7447ae3d1695e3c4cc5c26f
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ train_stats = ppo_trainer.step([query_tensor[0]], [response_tensor[0]], reward)
```

### Advanced example: IMDB sentiment
For a detailed example check out the example python script `examples/sentiment/scripts/gpt2-sentiment.py`, where GPT2 is fine-tuned to generate positive movie reviews. An few examples from the language models before and after optimisation are given below:
For a detailed example check out the example python script `examples/ppo_trainer/sentiment_tuning.py`, where GPT2 is fine-tuned to generate positive movie reviews. An few examples from the language models before and after optimisation are given below:

<div style="text-align: center">
<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/table_imdb_preview.png" width="800">
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ The examples are currently split over the following categories:
**2: [sft_trainer](https://github.com/lvwerra/trl/tree/main/examples/sft_trainer)**: Learn about how to leverage `SFTTrainer` for supervised fine-tuning your pretrained language models easily.
**3: [reward_modeling](https://github.com/lvwerra/trl/tree/main/examples/reward_modeling)**: Learn about how to use `RewardTrainer` to easily train your own reward model to use it for your RLHF pipeline.
**4: [research_projects](https://github.com/lvwerra/trl/tree/main/examples/research_projects)**: Check out that folder to check the scripts used for some research projects that used TRL (LM de-toxification, Stack-Llama, etc.)
**5: [notebooks](https://github.com/lvwerra/trl/tree/main/examples/notebooks)**: Check out that folder to check some applications of TRL features directly on a Jupyter notebook. This includes running sentiment tuning and sentiment control on a notebook, but also how to use "Best of N sampling" strategy using TRL.
12 changes: 0 additions & 12 deletions examples/best_of_n_sampling/README.md

This file was deleted.

Empty file.
7 changes: 7 additions & 0 deletions examples/notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Notebooks

This directory contains a collection of Jupyter notebooks that demonstrate how to use the TRL library in different applications.

- [`best_of_n.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/best_of_n.ipynb): This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO.
- [`gpt2-sentiment.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb): This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook.
- [`gpt2-control.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/gpt2-control.ipynb): This notebook demonstrates how to reproduce the GPT2 sentiment control exampel on a jupyter notebook.
2 changes: 1 addition & 1 deletion examples/research_projects/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Research projects that uses TRL

Welcome to the research projects folder! Here you can find the scripts used for some research projects that used TRL (LM de-toxification, Stack-Llama, etc.). Check out the READMEs in the subfolders for more information!
Welcome to the research projects folder! Here you can find the scripts used for some research projects that used TRL and maintained by the developpers and the community (LM de-toxification, Stack-Llama, etc.). Check out the READMEs in the subfolders for more information!

- [De-detoxifying language models](https://github.com/lvwerra/trl/tree/main/examples/research_projects/toxicity)
- [Stack-Llama](https://github.com/lvwerra/trl/tree/main/examples/research_projects/stack_llama)