Lists (6)
Sort Name ascending (A-Z)
Stars
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
FlashInfer: Kernel Library for LLM Serving
A throughput-oriented high-performance serving framework for LLMs
Flash Attention in ~100 lines of CUDA (forward pass only)
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.
Implement Flash Attention using Cute.
TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
This is a simple convolution implementation both for CPU_only and GPU_only (using CUDA)