Skip to content

[Feature] Use max_num_seqs tokens with profile_run for decode #1110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jianzs
Copy link
Collaborator

@jianzs jianzs commented Jun 7, 2025

Decoding doesn't need max_num_tokens for profile_run; just set max_num_seqs. However, for MTP, this might require adjustment.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs marked this pull request as draft June 7, 2025 03:04
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant