[BUG: https://mistral.ai/news/mistral-large-2407/ Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ? #235
Open
Description
Python -VV
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Pip Freeze
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Reproduction Steps
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Expected Behavior
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Additional Context
No response
Suggested Solutions
No response