Skip to content

Commit 04b5d2d

Browse files
author
jimmy.xj
committed
Update README.md
1 parent 12a3c0d commit 04b5d2d

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
1919

2020

2121
## 🔔 News
22+
* **[2023.10.30]** Add the AIOps Leaderboard.
2223
* **[2023.10.25]** Add the AIOps samples, including log parsing, time series anomaly detection, time series classification and root cause analysis.
2324
* **[2023.10.18]** Update the initial Leaderboard...
2425
<br>
@@ -38,7 +39,7 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
3839

3940
## 🏆 Leaderboard
4041
Below are zero-shot and five-shot accuracies from the models that we evaluate in the initial release. We note that five-shot performance is better than zero-shot for many instruction-tuned models.
41-
### DevOps
42+
### 👀 DevOps
4243
#### Zero Shot
4344

4445
| **ModelName** | plan | code | build | test | release | deploy | operate | monitor | **AVG** |
@@ -78,7 +79,7 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
7879
| Baichuan2-7B-Chat | 60.61 | 64.95 | 81.19 | 75.88 | 71.23 | 75.69 | 78.36 | 79.17 | 70.49 |
7980
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
8081

81-
### AIOps
82+
### 🔥 AIOps
8283
#### Zero Shot
8384
| **ModelName** | LogParsing | RootCauseAnalysis | TimeSeriesAnomalyDetection | TimeSeriesClassification | **AVG** |
8485
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|

README_zh.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
1919

2020

2121
## 🔔 更新
22+
* **[2023.10.30]** 增加针对AIOps场景的评测排行榜
2223
* **[2023.10.25]** 增加AIOps样本,包含日志解析、时序异常检测、时序分类和根因分析
2324
* **[2023.10.18]** DevOps-Eval发布大模型评测排行版
2425
<br>
@@ -39,7 +40,7 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
3940
## 🏆 排行榜
4041
以下是我们获得的初版评测结果,包括多个开源模型的zero-shot和five-shot准确率。我们注意到,对于大多数指令模型来说,five-shot的准确率要优于zero-shot。
4142

42-
### DevOps
43+
### 👀 DevOps
4344
#### Zero Shot
4445

4546
| **模型** | plan | code | build | test | release | deploy | operate | monitor | **平均分** |
@@ -80,7 +81,7 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
8081
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
8182

8283

83-
### AIOps
84+
### 🔥 AIOps
8485
#### Zero Shot
8586
| **模型** | 日志解析 | 根因分析 | 时序异常检测 | 时序分类 | **平均分** |
8687
|:-------------------:|:-----:|:----:|:------:|:----:|:-------:|

0 commit comments

Comments
 (0)