deep-gemm: add by drbh · Pull Request #382 · huggingface/kernels-community

drbh · 2026-02-20T21:47:12Z

This PR adds the deep-gemm kernels and relies on an experimental feature added in this PR huggingface/kernels#298

The deep-gemm kernels heavily rely on JIT compilation and need access to nvcc, cutlass headers and internal deep-gemm headers at runtime. This pr includes the internal headers and minor changes to lazily load nvrtc at runtime, and the related PR in the kernels builder updates the build process to inject cutlass headers into the build artifacts so the kernel has all of the required dependencies at runtime.

example usage

nvidia-smi -L
# GPU 0: NVIDIA H100 80GB HBM3 

# navigate to example and run
cd kernels-community/deep-gemm
uv run scripts/readme_example.py

[cuBLASLt BF16] shape: 256x1024x512, cosine_sim: 1.000000, max_diff: 0.0000
[FP8 1D2D] shape: 256x1024x512, cosine_sim: 0.999325, max_diff: 3.9062

note

if you are on a machine with cuda cap of >=9 you'll need cuda 12.9 and up for the JIT to build successfully dues to inlined asm that is not available on earlier version.
if you are on a machine with more than one cuda driver you may have to specify the cuda home like CUDA_HOME=/usr/local/cuda-12.9 uv run scripts/readme_example.py

This reverts commit d5ff437.

drbh added 7 commits February 19, 2026 10:06

feat: vendor deep-gemm

f8f4adf

feat: add vendored note to readme

dc403c7

feat: update to builder format

2ec25c0

fix: remove vendored workflows

5bf53fa

feat: add includes and bundle deps

7f1079e

fix: prefer remote builder

46a6040

fix: remove debug print

21a2954

drbh requested review from MekkCyber and danieldk as code owners February 20, 2026 21:47

fix: update readme example repo id

062551b

MekkCyber changed the title ~~Add deep gemm~~ deep-gemm: add Feb 24, 2026

MekkCyber and others added 7 commits February 24, 2026 11:32

empty commit for CI

a4d8454

min_ver 12.8

d5ff437

Revert "min_ver 12.8"

f43124c

This reverts commit d5ff437.

add deep-gemm source github link

236c424

Merge branch 'main' into add-deep-gemm

17d8191

Merge branch 'main' into add-deep-gemm

a247ce8

fix: simplify and prefer submodule

34b6aac

drbh mentioned this pull request Feb 25, 2026

feat: enable bundling cutlass in build huggingface/kernels#298

Closed

drbh added 2 commits February 25, 2026 15:21

fix: include the testing utils in build

a85d6c6

fix: include layout py file in build

030dd3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deep-gemm: add#382

deep-gemm: add#382
drbh wants to merge 17 commits intomainfrom
add-deep-gemm

drbh commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drbh commented Feb 20, 2026

note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants