Adds RULER benchmark #722

NathanHB · 2025-05-15T11:45:30Z

Available context size: 4096, 8192, 16384, 32768, 65536, 131072

uv run lighteval vllm "model_name=meta-llama/Llama-3.1-8B,dtype=bfloat16,max_model_length=131072" "lighteval|ruler_{context size}|0|0"

HuggingFaceDocBuilderDev · 2025-05-15T11:47:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/lighteval/metrics/metrics.py

clefourrier · 2025-05-19T12:58:31Z

src/lighteval/tasks/default_tasks.py

@@ -24,6 +24,1238 @@
 from lighteval.tasks.lighteval_task import LightevalTaskConfig


+ruler_niah_single_1_131072 = LightevalTaskConfig(


Imo we should keep the task names in alphabetical order to make the file easier to browse

src/lighteval/tasks/default_tasks.py

src/lighteval/tasks/default_prompts.py

Copilot

Pull Request Overview

This PR integrates the RULER benchmark into the lighteval framework, expanding its evaluation capabilities.

Implements a new prompt function for RULER in default_prompts.py
Wraps task iteration with tqdm progress bars in lighteval_task.py
Adds new RULER metrics in metrics.py and introduces a debugging breakpoint in vllm_model.py

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

File	Description
src/lighteval/tasks/lighteval_task.py	Adds tqdm import and progress bars to task iteration loops
src/lighteval/tasks/default_prompts.py	Adds a new RULER prompt function; contains a typo in the arc_with_options function
src/lighteval/models/vllm/vllm_model.py	Modifies logging format and introduces a breakpoint for debugging
src/lighteval/metrics/metrics.py	Adds new RULER metrics for evaluation

Copilot · 2025-05-21T14:53:01Z

src/lighteval/tasks/default_prompts.py

@@ -254,7 +265,7 @@ def arc_with_options(line, task_name: str = None):
    query += "".join([f"\n{key}. {choice}" for key, choice in zip(LETTER_INDICES, line["choices"]["text"])])
    query += "\nAnswer:"
    return Doc(
-        task_name=task_name,
+mm        task_name=task_name,


It appears that an extraneous 'mm' has been introduced before the task_name parameter. Please remove it to restore valid syntax.

Suggested change

mm task_name=task_name,

task_name=task_name,

src/lighteval/models/vllm/vllm_model.py

NathanHB · 2025-06-18T10:54:23Z

src/lighteval/tasks/extended/ruler/main.py

+                name=f"ruler_{length}:{subset}",
+                suite=["lighteval"],
+                prompt_function=prompt.ruler,
+                hf_repo=f"SaylorTwift/RULER-{length}-llama-3.2-tokenizer",


change the dataset here for other tokenizers

NathanHB · 2025-06-26T13:44:27Z

src/lighteval/models/vllm/vllm_model.py

@@ -105,6 +105,8 @@ class VLLMModelConfig(ModelConfig):
    max_num_batched_tokens: PositiveInt = 2048  # maximum number of tokens per batch
    subfolder: str | None = None
    is_async: bool = False  # Whether to use the async version or sync version of the model
+    use_dual_chunk_attention: bool = False


what version of vllm are you using for this ? I get TypeError: EngineArgs.__init__() got an unexpected keyword argument 'use_dual_chunk_attention' with vllm == 0.8.5.post1

I was on 0.9.1 I think

(changed my env to same as you now)

NathanHB added 3 commits May 15, 2025 11:37

adds RULE

65275d5

adds RULE

0ef15e3

adds RULE

9cafd75

NathanHB linked an issue May 15, 2025 that may be closed by this pull request

[EVAL] Add RULER for evaluating long context #726

Open

use llama 3.2 no chat template

ed3d907

NathanHB added the new-task label May 19, 2025

clefourrier approved these changes May 19, 2025

View reviewed changes

Merge branch 'main' into nathan-adds-helet

a4394ad

NathanHB requested a review from Copilot May 21, 2025 14:52

NathanHB commented May 21, 2025

View reviewed changes

src/lighteval/tasks/default_prompts.py Outdated Show resolved Hide resolved

Copilot AI reviewed May 21, 2025

View reviewed changes

Update src/lighteval/tasks/default_prompts.py

248bb67

NathanHB commented May 21, 2025

View reviewed changes

src/lighteval/models/vllm/vllm_model.py Outdated Show resolved Hide resolved

NathanHB and others added 4 commits May 21, 2025 16:53

Update src/lighteval/models/vllm/vllm_model.py

a1aee68

Merge branch 'main' into nathan-adds-helet

775705c

fix typo

461b8cb

put tuler in extedned tasks

57f2921

NathanHB commented Jun 18, 2025

View reviewed changes

added params for Nouamane

79e6a6e

NathanHB commented Jun 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds RULER benchmark #722

Adds RULER benchmark #722

NathanHB commented May 15, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2025

Uh oh!

Uh oh!

clefourrier May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 21, 2025

Uh oh!

Uh oh!

Uh oh!

NathanHB Jun 18, 2025

Uh oh!

NathanHB Jun 26, 2025

Uh oh!

clefourrier Jun 26, 2025

Uh oh!

clefourrier Jun 26, 2025

Uh oh!

Uh oh!

		@@ -24,6 +24,1238 @@
		from lighteval.tasks.lighteval_task import LightevalTaskConfig


		ruler_niah_single_1_131072 = LightevalTaskConfig(

Adds RULER benchmark #722

Are you sure you want to change the base?

Adds RULER benchmark #722

Conversation

NathanHB commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2025

Uh oh!

Uh oh!

clefourrier May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NathanHB Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

NathanHB Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

clefourrier Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

clefourrier Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NathanHB commented May 15, 2025 •

edited

Loading