Skip to content

Commit d27140c

Browse files
authored
Update README.md
1 parent e01182d commit d27140c

File tree

1 file changed

+23
-14
lines changed

1 file changed

+23
-14
lines changed

README.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,8 @@
99
<a href="https://huggingface.co/openchat">🤗Huggingface</a> |
1010
<a href="https://arxiv.org/pdf/2309.11235.pdf">📃Paper</a> |
1111
<a href="https://discord.gg/pQjnXvNKHY">💭Discord</a>
12-
<br><br>
13-
<strong>🏆 The Overall Best Performing Open Source 7B Model 🏆</strong>
14-
<br>
15-
<strong>🤖 Outperforms ChatGPT (March) and Grok-1 🤖</strong>
16-
<br>
1712
</p>
1813

19-
<div align="center">
20-
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat-bench-0106.png" style="width: 95%;">
21-
</div>
22-
2314
- OpenChat is an innovative library of **open-source language models**, fine-tuned with [**C-RLFT**](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning.
2415
- Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with `ChatGPT`, even with a `7B` model which can be run on a **consumer GPU (e.g. RTX 3090)**.
2516
- Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
@@ -28,6 +19,8 @@
2819

2920
# ✨ News
3021

22+
- [2024/05/22] We released the Llama-3 based version [OpenChat 3.6 20240522](https://huggingface.co/openchat/openchat-3.6-8b-20240522), outperforming official Llama 3 8B Instruct and open-source finetunes/merges.
23+
3124
- [2024/01/06] We released the second update, [OpenChat 3.5 0106](openchat/openchat-3.5-0106), further improved coding and overall performance 🏆.
3225

3326
- [2023/12/10] We released the first update, [OpenChat 3.5 1210](openchat/openchat-3.5-1210), improved coding by 15 points 🚀.
@@ -50,13 +43,31 @@
5043
- [2023/07/01] We released the [OpenChat V1 model series](#legacy-models).
5144
</details>
5245

53-
# 🏷️ Benchmarks
46+
# 🏷️ Benchmarks - OpenChat 3.6
47+
48+
<div align="center">
49+
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/benchmarks-openchat-3.6-20240522.svg" style="width: 95%;">
50+
</div>
51+
52+
53+
<details>
54+
<summary>Reproducing benchmarks</summary>
55+
56+
Note: Please run the following commands at the base directory of this repository.
57+
58+
```bash
59+
python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat-3.6-8b-20240522 --eval_sets fs_cothub/mmlu fs_cothub/gsm8k fs_cothub/math
60+
python -m ochat.evaluation.run_eval --condition "GPT4" --model openchat/openchat-3.6-8b-20240522 --eval_sets zs/gpqa
61+
```
62+
63+
HumanEval is run using the official [EvalPlus repository](https://github.com/evalplus/evalplus).
64+
</details>
65+
66+
# 🏷️ Benchmarks - OpenChat 3.5
5467

5568
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
5669
|-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
5770
| **OpenChat-3.5-0106** | **7B** | **64.5** | 7.8 | **71.3** | **51.5** | **49.1** | 61.0 | 65.8 | **77.4** | 62.2 |
58-
| OpenChat-3.5-1210 | **7B** | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | **61.8** | 65.3 | 77.3 | 61.8 |
59-
| OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 |
6071
| ChatGPT (March)* | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
6172
| | | | | | | | | | | |
6273
| OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
@@ -126,8 +137,6 @@ python gen_judgment.py --model-list openchat-3.5-0106 --parallel 8 --mode single
126137
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
127138
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
128139
| **OpenChat-3.5-0106** | Apache-2.0 | **7B** | **61.0** | 65.8 | **71.3** | **29.3** | **77.4** |
129-
| OpenChat-3.5-1210 | Apache-2.0 | **7B** | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 |
130-
| OpenChat-3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 |
131140
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
132141
| Grok-1 | Proprietary | ???B | 55.8 | **73** | 63.2 | 23.9 | 62.9 |
133142

0 commit comments

Comments
 (0)