Skip to content

[Feature] Support sequence parallelism for static fp8 quantization #19181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 23, 2025

Conversation

cascade812
Copy link
Contributor

@cascade812 cascade812 commented Jun 5, 2025

Add support sequence parallelism for static fp8 quantization in this PR.
It requires below config to enable it

config = CompilationConfig(level=3,
                           splitting_ops=[],
                           compile_sizes=[4],
                           custom_ops=["+rms_norm"])

# enable_noop is required to be True for correct sp pattern match 
config.pass_config.enable_noop = True
config.pass_config.enable_sequence_parallelism = True

llm = LLM(
    model="RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8",
    enforce_eager=False,
    tensor_parallel_size=2,
    compilation_config=config)

Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link

github-actions bot commented Jun 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot mentioned this pull request Jun 5, 2025
@yaochengji yaochengji requested a review from tlrmchlsmth June 5, 2025 05:49
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

@ProExpertProg @bnellnm Could you take a more detailed look at this?

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few asks:

  • I agree with @tlrmchlsmth that including fused ops might be unnecessary - could we just make this pass run before fusion, and then make sure fusion still works?
  • Is there any way we could make this pass more general and not reliant on the exact ops? That way it could also work if custom ops are disabled.
    • Perhaps here we could enable the custom ops and then lower them after the passes run, like you described in an offline conversation.
  • Could you post performance numbers? And should we do this for any other ops as well?

Signed-off-by: cascade812 <cascade812@outlook.com>
Copy link

mergify bot commented Jun 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @cascade812.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 17, 2025
Signed-off-by: cascade812 <cascade812@outlook.com>
@mergify mergify bot removed the needs-rebase label Jun 17, 2025
@cascade812
Copy link
Contributor Author

cascade812 commented Jun 17, 2025

A few asks:

  • I agree with @tlrmchlsmth that including fused ops might be unnecessary - could we just make this pass run before fusion, and then make sure fusion still works?

  • Is there any way we could make this pass more general and not reliant on the exact ops? That way it could also work if custom ops are disabled.

    • Perhaps here we could enable the custom ops and then lower them after the passes run, like you described in an offline conversation.
  • Could you post performance numbers? And should we do this for any other ops as well?

  • Right, it works after I move the sequence parallel pass to run before the fusion pass.
  • We can define a custom op that serves as a placeholder, then perform pattern matching on the custom op and lower it after the pass runs.
  • SP pass doesn't directly provide perf gain, it lays the groundwork for fusing matmul and collective ops like asynctp which can provide good perf gain. I can provide the perf numbers after I add asynctp for scaled mm + collective op fusion, will do it in a separate PR.
  • We also need similar work for dynamic fp8 ops which require different pattern match.

Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

2 similar comments
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: cascade812 <cascade812@outlook.com>
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: cascade812 <cascade812@outlook.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments!

Signed-off-by: cascade812 <cascade812@outlook.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits but looks great and thanks for all the code quality improvements!

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 20, 2025
Signed-off-by: cascade812 <cascade812@outlook.com>
@tlrmchlsmth tlrmchlsmth merged commit e6327c9 into vllm-project:main Jun 23, 2025
72 checks passed
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants