Skip to content
View yongwww's full-sized avatar
🐢
working
🐢
working
  • Redmond, WA
  • 19:15 (UTC -07:00)

Highlights

  • Pro

Organizations

@apache @octoml

Block or report yongwww

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TVM FFI

C++ 38 12 Updated Sep 15, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 298 22 Updated Sep 11, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 347 9 Updated Sep 14, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,108 94 Updated Sep 12, 2025

An implementation of a deep learning recommendation model (DLRM)

Python 3,961 864 Updated Sep 2, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,791 123 Updated Sep 12, 2025

Ultra and Unified CCL

C++ 539 47 Updated Sep 15, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 13,334 2,353 Updated Sep 15, 2025
Python 165 86 Updated Sep 14, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 17,907 2,918 Updated Sep 15, 2025

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

C++ 27,357 8,799 Updated Sep 13, 2025

FlashMLA: Efficient MLA kernels

C++ 11,722 898 Updated Aug 27, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 27,235 2,495 Updated Sep 13, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 8,428 1,435 Updated Sep 9, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,734 492 Updated Sep 14, 2025
C++ 37 6 Updated Jul 19, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,621 153 Updated Sep 14, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,233 86 Updated Sep 13, 2025

Experimental projects related to TensorRT

MLIR 111 17 Updated Sep 12, 2025

Development repository for the Triton language and compiler

MLIR 16,857 2,241 Updated Sep 15, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,920 1,177 Updated Sep 7, 2025

Empowering everyone to build reliable and efficient software.

Rust 106,451 13,735 Updated Sep 15, 2025

The official Python library for the OpenAI API

Python 28,670 4,265 Updated Sep 14, 2025

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,618 180 Updated Jun 25, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 58,011 10,112 Updated Sep 14, 2025

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…

Python 1,351 152 Updated Sep 13, 2025

Generative Models by Stability AI

Python 26,380 2,948 Updated May 20, 2025

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,306 1,667 Updated Feb 29, 2024

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,616 3,658 Updated Sep 14, 2025
Next