Update README.md

raunak-agarwal · web-flow · commit 476e61bb6198 · 2023-04-21T12:20:10.000+02:00
diff --git a/README.md b/README.md
@@ -45,10 +45,9 @@ All available datasets for Instruction Tuning of Large Language Models
   - LLM instruction generation for a diverse set of corpus samples (27,739 instructions and long text pairs)
 - LLaVA Visual Instruct 150K: https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K
   - GPT-generated multimodal instruction-following data
+- 
 
-### Misc
-- OIG: https://huggingface.co/datasets/laion/OIG
-  - Superset of some of the datasets here
+### Preference Datasets (can be used to train the reward model)
 - HH-RLHF: https://huggingface.co/datasets/Anthropic/hh-rlhf
   - Contains human ratings of harmfulness and helpfulness of model outputs. The dataset contains ~160K human-rated examples, where each example in this dataset consists of a pair of responses from a chatbot, one of which is preferred by humans.
 - OpenAI WebGPT: https://huggingface.co/datasets/openai/webgpt_comparisons
@@ -57,7 +56,13 @@ All available datasets for Instruction Tuning of Large Language Models
   - Contains ~93K examples, each example consists of feedback from humans regarding the summarizations generated by a model. Human evaluators chose the superior summary from two options.
 - Stanford Human Preferences Dataset (SHP): https://huggingface.co/datasets/stanfordnlp/SHP
   - 385K collective human preferences over responses to questions/instructions in 18 different subject areas
+- Stack Exchange Preferences: https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences
+
+### Misc
+- OIG: https://huggingface.co/datasets/laion/OIG
+  - Superset of some of the datasets here
 - oa_leet10k: https://huggingface.co/datasets/ehartford/oa_leet10k
   - LeetCode problems solved in multiple programming languages
+