Skip to content

[Bug] Gsm8k evaluation- 评测答案后处理问题 #2343

@dorisdengpp-crypto

Description

@dorisdengpp-crypto

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

opencompass 0.5.1

Reproduces the problem - code/configuration sample

prediction.json和result.json文件发现出大量推理正确结果正确但是被判别错误情况

Reproduces the problem - command or script

案例
推理结果:
"origin_prompt": "Question: Josh decides to try flipping a house. He buys a house for $80,000 and then puts in $50,000 in repairs. This increased the value of the house by 150%. How much profit did he make?\nLet's think step by step\nAnswer:",
"prediction": "Josh buys the house for $80,000 and spends $50,000 on repairs, so his total investment is:\n\n$80,000 (purchase) + $50,000 (repairs) = $130,000 total invested.\n\nThe repairs increased the value of the house by 150%, meaning the house’s value increased by:\n\n150% of $80,000 = 1.5 × $80,000 = $120,000 increase in value.\n\nSo the new value of the house is:\n\n$80,000 (original value) + $120,000 (increase) = $200,000.\n\nJosh’s profit is the final value minus his total investment:\n\n$200,000 (final value) - $130,000 (total investment) = $70,000 profit.\n\nThe answer is $70,000.",
"gold": "The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\nHe increased the value of the house by 80,0001.5=<<800001.5=120000>>120,000\nSo the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n#### 70000"
},

Reproduces the problem - error message

后处理结果
"pred": ["000"],
"answer": ["70000"],
"correct": [false]

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions