
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A lightweight data processing framework built on DuckDB and 3FS.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Optimized primitives for collective multi-GPU communication
PyTorch library for cost-effective, fast and easy serving of MoE models.
Janus-Series: Unified Multimodal Understanding and Generation Models
Fully open reproduction of DeepSeek-R1
Automatically split your PyTorch models on multiple GPUs for training & inference
🌵 A responsive, clean and simple theme for Hexo.
[CVPR 2025] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Sky-T1: Train your own O1 preview model within $450
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)