Skip to content

deep-gemm: add#382

Open
drbh wants to merge 17 commits intomainfrom
add-deep-gemm
Open

deep-gemm: add#382
drbh wants to merge 17 commits intomainfrom
add-deep-gemm

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Feb 20, 2026

This PR adds the deep-gemm kernels and relies on an experimental feature added in this PR huggingface/kernels#298

The deep-gemm kernels heavily rely on JIT compilation and need access to nvcc, cutlass headers and internal deep-gemm headers at runtime. This pr includes the internal headers and minor changes to lazily load nvrtc at runtime, and the related PR in the kernels builder updates the build process to inject cutlass headers into the build artifacts so the kernel has all of the required dependencies at runtime.

example usage

nvidia-smi -L
# GPU 0: NVIDIA H100 80GB HBM3 

# navigate to example and run
cd kernels-community/deep-gemm
uv run scripts/readme_example.py
[cuBLASLt BF16] shape: 256x1024x512, cosine_sim: 1.000000, max_diff: 0.0000
[FP8 1D2D] shape: 256x1024x512, cosine_sim: 0.999325, max_diff: 3.9062

note

  • if you are on a machine with cuda cap of >=9 you'll need cuda 12.9 and up for the JIT to build successfully dues to inlined asm that is not available on earlier version.
  • if you are on a machine with more than one cuda driver you may have to specify the cuda home like CUDA_HOME=/usr/local/cuda-12.9 uv run scripts/readme_example.py

@MekkCyber MekkCyber changed the title Add deep gemm deep-gemm: add Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants