Skip to content

[RFC]: Blackwell Enablement for vLLM (SM100) #18153

Open
@pavanimajety

Description

@pavanimajety

Motivation.

We are in the process of making incremental changes for Blackwell Support in vLLM. This issue is a tracker for all the items that are planned.

Planned or In Progress Features

The following items are either planned or currently in progress to enable vLLM support on Blackwell.

  • Enable NVFP4 Support

    • (NVIDIA) Add functional support for NVFP4 Kernels for linear layers
    • (NVIDIA) Add functional support for NVFP4 MoE Kernels
    • (NVIDIA) Add Model Integration for nvidia/*-FP4 models
    • Finetune GEMM configurations for Blackwell
    • (NVIDIA) Optimize MoE for Latency
    • (NVIDIA) Optimize MoE for Throughput FI: PR !1113
    • (NVIDIA) MoE All Reduce Fusion FI: PR !1108
  • Optimize communication overlap ops

    • (NVIDIA) Enable NCCL’s symmetric memory
    • (NVIDIA) Add support for Gemm + comm overlap
  • Blackwell Attention Kernels

  • FP8 Blockscale Gemm and MoE

    • (NVIDIA) FP8 Blockscale GEMM
    • (NVIDIA) FP8 Blockscale gemm optimizations: Sm100 blockwise fp8 swap ab #18564
    • (NVIDIA) FP8 Blockscale MoE
    • (NVIDIA) Latency and throughput optimizations
  • MTP support

Feedback Period.

No response

CC List.

@kushanam @kaixih

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions