Repurpose Scheduler Spec Dec metric for testing correctness #1

ekagra-ranjan · 2025-04-09T20:37:16Z

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

To enable AL computation for checking correctness, this PR aggregates the metrics across steps in the EngineCoreOutputs.scheduler_stats

ekagra-ranjan · 2025-04-09T20:39:33Z

examples/offline_inference/eagle.py

+        num_draft_tokens = scheduler_stats.spec_decoding_stats.num_draft_tokens
+        num_accepted_tokens = scheduler_stats.spec_decoding_stats.num_accepted_tokens
+        num_spec_proposal = num_draft_tokens / args.num_spec_tokens
+        mean_accepted_tokens = 1 + num_accepted_tokens / num_spec_proposal


num_spec_proposal is the num of times the SD call was made

mean_accepted_tokens = (sum of generated tokens over num_spec_proposal) / num_spec_proposal
= (num_spec_proposal + sum of accepted tokens over num_spec_proposal) / num_spec_proposal
= 1 + num_accepted_tokens / num_spec_proposal

ekagra-ranjan · 2025-04-09T20:40:21Z

vllm/v1/core/sched/scheduler.py

+        # spec_decoding_stats: Optional[SpecDecodingStats] = None
+        spec_decoding_stats = self.spec_decoding_stats


cache the spec_decoding_stats so that it keeps a running metric instead of reinit it every engine step

ekagra-ranjan · 2025-04-09T20:41:00Z

examples/offline_inference/eagle.py

@@ -48,7 +48,8 @@ def main():
    args = parser.parse_args()

    model_dir = "meta-llama/Meta-Llama-3-8B-Instruct"
-    eagle_dir = "abhigoyal/EAGLE-LLaMA3-Instruct-8B-vllm"
+    # eagle_dir = "yuhuili/EAGLE-LLaMA3-Instruct-8B"
+    eagle_dir = "lmsys/sglang-EAGLE-LLaMA3-Instruct-8B"


using sglang model so that the prev SGL bench is comparable: https://docs.google.com/document/d/18ETJLsnxR88Qq3VDk5Mq-Hb7vuE9o3VNZ-hhz-OqAXk/edit?usp=sharing

markmc · 2025-05-09T14:35:48Z

hi @ekagra-ranjan

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

Hmm, we discussed this on Slack shortly after you submitted this PR

spec_decoding_stats = self.make_spec_decoding_stats(
                    spec_decoding_stats,

a new SpecDecodingStats should only be created once per update_from_output() call - we should aggregate across all requests in a single step

   def make_spec_decoding_stats(
        self,
        spec_decoding_stats: Optional[SpecDecodingStats],
        ...
    ) -> Optional[SpecDecodingStats]:
        ...
        if spec_decoding_stats is None:
            spec_decoding_stats = SpecDecodingStats()
        ...
        return spec_decoding_stats

Your response, for reference:

Oh, the purpose of the SpecDecodingStats is just to aggregate across the req per step and is reinit per step. Then it is working fine.

ekagra-ranjan · 2025-05-09T15:15:15Z

Hi @markmc - yup, we are good. I am still using this hacky PR whenever I want to quickly find the AL for my evals since vllm-project#16367 is still not merged

ekagra-ranjan added 2 commits April 9, 2025 20:32

resolve conflict and repurpose sched sd metric for testing correctness

27a1ef7

enable log

6b46e3a

ekagra-ranjan commented Apr 9, 2025

View reviewed changes

ekagra-ranjan changed the title ~~Repurpose sched sd metric for testing correctness~~ Repurpose Scheduler Spec Dec metric for testing correctness Apr 9, 2025

ekagra-ranjan mentioned this pull request Apr 9, 2025

[V1][Spec Decode] Eagle Model loading vllm-project/vllm#16035

Merged

luyuzhe111 mentioned this pull request Apr 10, 2025

[V1] Add request-level, per-step acceptance counts tracking for spec dec. vllm-project/vllm#16367

Open

ekagra-ranjan mentioned this pull request Apr 11, 2025

[V1][Spec Decode] Non greedy sample with EAGLE / Reduce memory allocation for Rejection Sampler vllm-project/vllm#16077

Closed

2 tasks

This was referenced May 17, 2025

Quick Metric hack #2

Closed

metric hack #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repurpose Scheduler Spec Dec metric for testing correctness #1

Repurpose Scheduler Spec Dec metric for testing correctness #1

Uh oh!

ekagra-ranjan commented Apr 9, 2025 •

edited

Loading

Uh oh!

ekagra-ranjan Apr 9, 2025 •

edited

Loading

Uh oh!

ekagra-ranjan Apr 9, 2025

Uh oh!

ekagra-ranjan Apr 9, 2025

Uh oh!

markmc commented May 9, 2025

Uh oh!

ekagra-ranjan commented May 9, 2025

Uh oh!

Uh oh!

		# spec_decoding_stats: Optional[SpecDecodingStats] = None
		spec_decoding_stats = self.spec_decoding_stats

Repurpose Scheduler Spec Dec metric for testing correctness #1

Are you sure you want to change the base?

Repurpose Scheduler Spec Dec metric for testing correctness #1

Uh oh!

Conversation

ekagra-ranjan commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ekagra-ranjan Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekagra-ranjan Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

ekagra-ranjan Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

markmc commented May 9, 2025

Uh oh!

ekagra-ranjan commented May 9, 2025

Uh oh!

Uh oh!

ekagra-ranjan commented Apr 9, 2025 •

edited

Loading

ekagra-ranjan Apr 9, 2025 •

edited

Loading