- Redmond, WA
-
19:15
(UTC -07:00)
Highlights
- Pro
Stars
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Distributed Compiler based on Triton for Parallel Systems
An implementation of a deep learning recommendation model (DLRM)
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
verl: Volcano Engine Reinforcement Learning for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
A generative world for general-purpose robotics & embodied AI learning.
FlashInfer: Kernel Library for LLM Serving
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Fast, Flexible and Portable Structured Generation
Experimental projects related to TensorRT
Development repository for the Triton language and compiler
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Empowering everyone to build reliable and efficient software.
The official Python library for the OpenAI API
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
A high-throughput and memory-efficient inference and serving engine for LLMs
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
Generative Models by Stability AI
High-Resolution Image Synthesis with Latent Diffusion Models
Open deep learning compiler stack for cpu, gpu and specialized accelerators