cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
-
Updated
May 28, 2026 - Python
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)
🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.
Patches + recipe to deploy festr2/MiMo-V2.5-Pro-NVFP4-MXFP8-attn-TP8 on 8-node DGX Spark sm_121 (Ray + vLLM, TP=8). Fixes the fused-qkv loader bug that mis-slotted Q values as K/V on 7 of 8 ranks.
Block-scaled FP8 / FP4 / INT4 tensor primitive with Triton scaled-matmul at FP32 parity on H100. NumPy / PyTorch / MLX / JAX backends.
Add a description, image, and links to the mxfp8 topic page so that developers can more easily learn about it.
To associate your repository with the mxfp8 topic, visit your repo's landing page and select "manage topics."