评测数学能力pass@1如何生成多个结果来评估？ #2250

DELEnomore · 2025-08-26T08:30:06Z

DELEnomore
Aug 26, 2025

我想要复现DeepSeek-R1的评测结果。我添加了generation_kwargs num_return_sequences=2,之后，在AIME2025和MATH500上的结果都是0(添加参数之前可以正常出结果)？我注意到Docs > 强推理模型评测教程这一章节的做法是最终的结果取平均值，虽然计算路线不一样但是数学结果上来说似乎是一致的，这是否是唯一做法？有没有针对同一个问题的多次回答先算平均值最后在对所有问题求平均的方法？并且我需要使用MathEvaluator的能力，基于规则评价而不是LLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

评测数学能力pass@1如何生成多个结果来评估？ #2250

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

评测数学能力pass@1如何生成多个结果来评估？ #2250

Uh oh!

DELEnomore Aug 26, 2025

Replies: 0 comments

DELEnomore
Aug 26, 2025