Skip to content

Repurpose Scheduler Spec Dec metric for testing correctness #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ekagra-ranjan
Copy link
Owner

@ekagra-ranjan ekagra-ranjan commented Apr 9, 2025

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

To enable AL computation for checking correctness, this PR aggregates the metrics across steps in the EngineCoreOutputs.scheduler_stats

Comment on lines +109 to +112
num_draft_tokens = scheduler_stats.spec_decoding_stats.num_draft_tokens
num_accepted_tokens = scheduler_stats.spec_decoding_stats.num_accepted_tokens
num_spec_proposal = num_draft_tokens / args.num_spec_tokens
mean_accepted_tokens = 1 + num_accepted_tokens / num_spec_proposal
Copy link
Owner Author

@ekagra-ranjan ekagra-ranjan Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_spec_proposal is the num of times the SD call was made

mean_accepted_tokens = (sum of generated tokens over num_spec_proposal) / num_spec_proposal
= (num_spec_proposal + sum of accepted tokens over num_spec_proposal) / num_spec_proposal
= 1 + num_accepted_tokens / num_spec_proposal

Comment on lines +573 to +574
# spec_decoding_stats: Optional[SpecDecodingStats] = None
spec_decoding_stats = self.spec_decoding_stats
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache the spec_decoding_stats so that it keeps a running metric instead of reinit it every engine step

@@ -48,7 +48,8 @@ def main():
args = parser.parse_args()

model_dir = "meta-llama/Meta-Llama-3-8B-Instruct"
eagle_dir = "abhigoyal/EAGLE-LLaMA3-Instruct-8B-vllm"
# eagle_dir = "yuhuili/EAGLE-LLaMA3-Instruct-8B"
eagle_dir = "lmsys/sglang-EAGLE-LLaMA3-Instruct-8B"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc
Copy link

markmc commented May 9, 2025

hi @ekagra-ranjan

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

Hmm, we discussed this on Slack shortly after you submitted this PR

spec_decoding_stats = self.make_spec_decoding_stats(
                    spec_decoding_stats,

a new SpecDecodingStats should only be created once per update_from_output() call - we should aggregate across all requests in a single step

   def make_spec_decoding_stats(
        self,
        spec_decoding_stats: Optional[SpecDecodingStats],
        ...
    ) -> Optional[SpecDecodingStats]:
        ...
        if spec_decoding_stats is None:
            spec_decoding_stats = SpecDecodingStats()
        ...
        return spec_decoding_stats

Your response, for reference:

Oh, the purpose of the SpecDecodingStats is just to aggregate across the req per step and is reinit per step. Then it is working fine.

@ekagra-ranjan
Copy link
Owner Author

Hi @markmc - yup, we are good. I am still using this hacky PR whenever I want to quickly find the AL for my evals since vllm-project#16367 is still not merged

This was referenced May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants