-
Notifications
You must be signed in to change notification settings - Fork 256
Adds RULER benchmark #722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adds RULER benchmark #722
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
src/lighteval/tasks/default_tasks.py
Outdated
@@ -24,6 +24,1238 @@ | |||
from lighteval.tasks.lighteval_task import LightevalTaskConfig | |||
|
|||
|
|||
ruler_niah_single_1_131072 = LightevalTaskConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo we should keep the task names in alphabetical order to make the file easier to browse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates the RULER benchmark into the lighteval framework, expanding its evaluation capabilities.
- Implements a new prompt function for RULER in default_prompts.py
- Wraps task iteration with tqdm progress bars in lighteval_task.py
- Adds new RULER metrics in metrics.py and introduces a debugging breakpoint in vllm_model.py
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/lighteval/tasks/lighteval_task.py | Adds tqdm import and progress bars to task iteration loops |
src/lighteval/tasks/default_prompts.py | Adds a new RULER prompt function; contains a typo in the arc_with_options function |
src/lighteval/models/vllm/vllm_model.py | Modifies logging format and introduces a breakpoint for debugging |
src/lighteval/metrics/metrics.py | Adds new RULER metrics for evaluation |
@@ -254,7 +265,7 @@ def arc_with_options(line, task_name: str = None): | |||
query += "".join([f"\n{key}. {choice}" for key, choice in zip(LETTER_INDICES, line["choices"]["text"])]) | |||
query += "\nAnswer:" | |||
return Doc( | |||
task_name=task_name, | |||
mm task_name=task_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that an extraneous 'mm' has been introduced before the task_name parameter. Please remove it to restore valid syntax.
mm task_name=task_name, | |
task_name=task_name, |
Copilot uses AI. Check for mistakes.
https://github.com/NVIDIA/RULER
Available context size: 4096, 8192, 16384, 32768, 65536, 131072