more refactor

huggingface · younesbelkada · Jul 14, 2023 · Jul 10, 2023 · Jul 10, 2023 · Jul 10, 2023
commit a443aa7fa1a068a3f7447ae3d1695e3c4cc5c26f
diff --git a/README.md b/README.md
@@ -163,7 +163,7 @@ train_stats = ppo_trainer.step([query_tensor[0]], [response_tensor[0]], reward)
 ```
 
 ### Advanced example: IMDB sentiment
-For a detailed example check out the example python script `examples/sentiment/scripts/gpt2-sentiment.py`, where GPT2 is fine-tuned to generate positive movie reviews. An few examples from the language models before and after optimisation are given below:
+For a detailed example check out the example python script `examples/ppo_trainer/sentiment_tuning.py`, where GPT2 is fine-tuned to generate positive movie reviews. An few examples from the language models before and after optimisation are given below:
 
 <div style="text-align: center">
 <img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/table_imdb_preview.png" width="800">

diff --git a/examples/README.md b/examples/README.md
@@ -28,3 +28,4 @@ The examples are currently split over the following categories:
 **2: [sft_trainer](https://github.com/lvwerra/trl/tree/main/examples/sft_trainer)**: Learn about how to leverage `SFTTrainer` for supervised fine-tuning your pretrained language models easily.
 **3: [reward_modeling](https://github.com/lvwerra/trl/tree/main/examples/reward_modeling)**: Learn about how to use `RewardTrainer` to easily train your own reward model to use it for your RLHF pipeline.
 **4: [research_projects](https://github.com/lvwerra/trl/tree/main/examples/research_projects)**: Check out that folder to check the scripts used for some research projects that used TRL (LM de-toxification, Stack-Llama, etc.)
+**5: [notebooks](https://github.com/lvwerra/trl/tree/main/examples/notebooks)**: Check out that folder to check some applications of TRL features directly on a Jupyter notebook. This includes running sentiment tuning and sentiment control on a notebook, but also how to use "Best of N sampling" strategy using TRL.
diff --git a/examples/best_of_n_sampling/README.md b/examples/best_of_n_sampling/README.md
diff --git a/examples/best_of_n_sampling/best_of_n_sampling.py b/examples/best_of_n_sampling/best_of_n_sampling.py
diff --git a/examples/notebooks/README.md b/examples/notebooks/README.md
@@ -0,0 +1,7 @@
+# Notebooks
+
+This directory contains a collection of Jupyter notebooks that demonstrate how to use the TRL library in different applications.
+
+- [`best_of_n.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/best_of_n.ipynb): This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO.
+- [`gpt2-sentiment.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb): This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook.
+- [`gpt2-control.ipynb`](https://github.com/lvwerra/trl/tree/main/examples/notebooks/gpt2-control.ipynb): This notebook demonstrates how to reproduce the GPT2 sentiment control exampel on a jupyter notebook.
diff --git a/examples/research_projects/README.md b/examples/research_projects/README.md
@@ -1,6 +1,6 @@
 # Research projects that uses TRL
 
-Welcome to the research projects folder! Here you can find the scripts used for some research projects that used TRL (LM de-toxification, Stack-Llama, etc.). Check out the READMEs in the subfolders for more information!
+Welcome to the research projects folder! Here you can find the scripts used for some research projects that used TRL and maintained by the developpers and the community (LM de-toxification, Stack-Llama, etc.). Check out the READMEs in the subfolders for more information!
 
 - [De-detoxifying language models](https://github.com/lvwerra/trl/tree/main/examples/research_projects/toxicity)
 - [Stack-Llama](https://github.com/lvwerra/trl/tree/main/examples/research_projects/stack_llama)