SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services
🤗 Hugging Face | 🤖 ModelScope | 📑 Paper | 🛠️ Code
评测代码安装依赖可见Opencompass REAME文档。
SNS-Bench数据集加载和评测代码位于 opencompass/opencompass/datasets/sns_bench目录下,配置文件位于 opencompass/opencompass/configs/datasets/sns_bench 。
执行下面命令即可
cd SNS-Bench/opencompass
opencompass examples/eval_sns_bench.pyDetailed calculation methods of the metrics are provided in Section 4.2 of the paper.
The relevant code is located in the code folder.
我们在论文主表中汇报的指标reported_score如下(部分特别说明):
# Note-CHLW
# code/chlw.py (line 101)
# reported_score = success_f1
# Note-QueryCorr [Topic]
# code/query_corr_topic.py (line 84)
# reported_score = success-macro-f1
# Note-MRC [Simple]
# code/mrc_simple.py (line 174-178)
# reported_score = AVG(success-f1 + success-blue + success-rouge-1 + success-rouge-2 + success-rouge-L)
# Note-MRC [Complex]
# code/mrc_complex.py (line 155-157)
# reported_score = AVG(success-total-f1 + success-option-f1 + success-option-em)感谢Opencompass优秀的评测框架
The dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This means that the data and models trained using the dataset can be used for non-commercial purposes as long as proper attribution is provided. Commercial use is strictly prohibited without explicit permission from the authors. If the dataset is remixed, adapted, or built upon, the modified dataset must be licensed under identical terms.
The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
