[BUG: https://mistral.ai/news/mistral-large-2407/ Are there relevant papers, and what are the metrics used to measure the dataset?     For example, Is the evaluation metric for MultiPL-E pass@1 ?

### Python -VV

```shell
Are there relevant papers, and what are the metrics used to measure the dataset?     For example, Is the evaluation metric for MultiPL-E pass@1 ?
```


### Pip Freeze

```shell
Are there relevant papers, and what are the metrics used to measure the dataset?     For example, Is the evaluation metric for MultiPL-E pass@1 ?
```


### Reproduction Steps

Are there relevant papers, and what are the metrics used to measure the dataset?     For example, Is the evaluation metric for MultiPL-E pass@1 ?

### Expected Behavior

Are there relevant papers, and what are the metrics used to measure the dataset?     For example, Is the evaluation metric for MultiPL-E pass@1 ?

### Additional Context

_No response_

### Suggested Solutions

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG: https://mistral.ai/news/mistral-large-2407/ Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ? #235

Python -VV

Pip Freeze

Reproduction Steps

Expected Behavior

Additional Context

Suggested Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development