Skip to content

Adds GSM-PLUS #780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Adds GSM-PLUS #780

wants to merge 2 commits into from

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented May 23, 2025

Results for HuggingFaceTB/SmolLM2-1.7B-Instruct

uv run lighteval vllm "model_name=HuggingFaceTB/SmolLM2-1.7B-Instruct"  "lighteval|gsm_plus|0|0"   --use-chat-template
Task Version Metric Value Stderr
all extractive_match 0.213 ± 0.0043
lighteval:gsm_plus:0 0 extractive_match 0.213 ± 0.0043

@NathanHB NathanHB linked an issue May 23, 2025 that may be closed by this pull request
@NathanHB NathanHB self-assigned this May 23, 2025
@NathanHB NathanHB requested review from lewtun and Copilot May 23, 2025 10:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for a new GSM-Plus task by registering its configuration and prompt handler.

  • Introduce a gsm_plus task in default_tasks.py
  • Implement gsm_plus prompt logic in default_prompts.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/lighteval/tasks/default_tasks.py Register new gsm_plus task configuration
src/lighteval/tasks/default_prompts.py Add prompt function for filtering and formatting
Comments suppressed due to low confidence (1)

src/lighteval/tasks/default_tasks.py:7963

  • Add tests to cover the new gsm_plus task configuration (e.g., prompt generation and evaluation flow) to ensure it behaves as expected.
gsm_plus = LightevalTaskConfig(

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice eval! Before merging, could you run 1-2 models from their table to see if we get similar results?

Screenshot 2025-05-23 at 12 12 43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EVAL] GSM Plus
3 participants