-
Microsoft
Stars
FlashInfer: Kernel Library for LLM Serving
Training and serving large-scale neural networks with auto parallelization.
A high-throughput and memory-efficient inference and serving engine for LLMs
MSCCL++: A GPU-driven communication stack for scalable AI applications
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Updated C version of the Test Suite for Vectorising Compilers
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Development repository for the Triton language and compiler
Synthesizer for optimal collective communication algorithms