|
9 | 9 | <a href="https://huggingface.co/openchat">🤗Huggingface</a> |
|
10 | 10 | <a href="https://arxiv.org/pdf/2309.11235.pdf">📃Paper</a> |
|
11 | 11 | <a href="https://discord.gg/pQjnXvNKHY">💭Discord</a>
|
12 |
| - <br><br> |
13 |
| - <strong>🏆 The Overall Best Performing Open Source 7B Model 🏆</strong> |
14 |
| - <br> |
15 |
| - <strong>🤖 Outperforms ChatGPT (March) and Grok-1 🤖</strong> |
16 |
| - <br> |
17 | 12 | </p>
|
18 | 13 |
|
19 |
| -<div align="center"> |
20 |
| - <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat-bench-0106.png" style="width: 95%;"> |
21 |
| -</div> |
22 |
| - |
23 | 14 | - OpenChat is an innovative library of **open-source language models**, fine-tuned with [**C-RLFT**](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning.
|
24 | 15 | - Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with `ChatGPT`, even with a `7B` model which can be run on a **consumer GPU (e.g. RTX 3090)**.
|
25 | 16 | - Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
|
|
28 | 19 |
|
29 | 20 | # ✨ News
|
30 | 21 |
|
| 22 | + - [2024/05/22] We released the Llama-3 based version [OpenChat 3.6 20240522](https://huggingface.co/openchat/openchat-3.6-8b-20240522), outperforming official Llama 3 8B Instruct and open-source finetunes/merges. |
| 23 | + |
31 | 24 | - [2024/01/06] We released the second update, [OpenChat 3.5 0106](openchat/openchat-3.5-0106), further improved coding and overall performance 🏆.
|
32 | 25 |
|
33 | 26 | - [2023/12/10] We released the first update, [OpenChat 3.5 1210](openchat/openchat-3.5-1210), improved coding by 15 points 🚀.
|
|
50 | 43 | - [2023/07/01] We released the [OpenChat V1 model series](#legacy-models).
|
51 | 44 | </details>
|
52 | 45 |
|
53 |
| -# 🏷️ Benchmarks |
| 46 | +# 🏷️ Benchmarks - OpenChat 3.6 |
| 47 | + |
| 48 | +<div align="center"> |
| 49 | + <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/benchmarks-openchat-3.6-20240522.svg" style="width: 95%;"> |
| 50 | +</div> |
| 51 | + |
| 52 | + |
| 53 | +<details> |
| 54 | + <summary>Reproducing benchmarks</summary> |
| 55 | + |
| 56 | +Note: Please run the following commands at the base directory of this repository. |
| 57 | + |
| 58 | +```bash |
| 59 | +python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat-3.6-8b-20240522 --eval_sets fs_cothub/mmlu fs_cothub/gsm8k fs_cothub/math |
| 60 | +python -m ochat.evaluation.run_eval --condition "GPT4" --model openchat/openchat-3.6-8b-20240522 --eval_sets zs/gpqa |
| 61 | +``` |
| 62 | + |
| 63 | +HumanEval is run using the official [EvalPlus repository](https://github.com/evalplus/evalplus). |
| 64 | +</details> |
| 65 | + |
| 66 | +# 🏷️ Benchmarks - OpenChat 3.5 |
54 | 67 |
|
55 | 68 | | Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|
56 | 69 | |-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
|
57 | 70 | | **OpenChat-3.5-0106** | **7B** | **64.5** | 7.8 | **71.3** | **51.5** | **49.1** | 61.0 | 65.8 | **77.4** | 62.2 |
|
58 |
| -| OpenChat-3.5-1210 | **7B** | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | **61.8** | 65.3 | 77.3 | 61.8 | |
59 |
| -| OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 | |
60 | 71 | | ChatGPT (March)* | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
|
61 | 72 | | | | | | | | | | | | |
|
62 | 73 | | OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
|
@@ -126,8 +137,6 @@ python gen_judgment.py --model-list openchat-3.5-0106 --parallel 8 --mode single
|
126 | 137 | | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
127 | 138 | |-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
|
128 | 139 | | **OpenChat-3.5-0106** | Apache-2.0 | **7B** | **61.0** | 65.8 | **71.3** | **29.3** | **77.4** |
|
129 |
| -| OpenChat-3.5-1210 | Apache-2.0 | **7B** | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 | |
130 |
| -| OpenChat-3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 | |
131 | 140 | | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
132 | 141 | | Grok-1 | Proprietary | ???B | 55.8 | **73** | 63.2 | 23.9 | 62.9 |
|
133 | 142 |
|
|
0 commit comments