yongwww

🐢

working

Yong Wu yongwww

🐢

working

MLSys Engineer @ Nvidia | Machine Learning compiler and LLM engine co-design

66 followers · 80 following

@NVIDIA
Redmond, WA
19:15 (UTC -07:00)

Achievements

x3 x3

Achievements

x3 x3

Highlights

Organizations

Stars

apache / tvm-ffi

TVM FFI

C++ 38 12 Updated Sep 15, 2025

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 298 22 Updated Sep 11, 2025

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 347 9 Updated Sep 14, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,108 94 Updated Sep 12, 2025

facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)

Python 3,961 864 Updated Sep 2, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,791 123 Updated Sep 12, 2025

uccl-project / uccl

Ultra and Unified CCL

C++ 539 47 Updated Sep 15, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 13,334 2,353 Updated Sep 15, 2025

mlc-ai / relax

Python 165 86 Updated Sep 14, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 17,907 2,918 Updated Sep 15, 2025

dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

C++ 27,357 8,799 Updated Sep 13, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA kernels

C++ 11,722 898 Updated Aug 27, 2025

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 27,235 2,495 Updated Sep 13, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 8,428 1,435 Updated Sep 9, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,734 492 Updated Sep 14, 2025

deepseek-ai / DeepSeek-R1

91,075 11,738 Updated Jun 27, 2025

mlc-ai / mlc-python

C++ 37 6 Updated Jul 19, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,621 153 Updated Sep 14, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,233 86 Updated Sep 13, 2025

NVIDIA / TensorRT-Incubator

Experimental projects related to TensorRT

MLIR 111 17 Updated Sep 12, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 16,857 2,241 Updated Sep 15, 2025

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,920 1,177 Updated Sep 7, 2025

rust-lang / rust

Empowering everyone to build reliable and efficient software.

Rust 106,451 13,735 Updated Sep 15, 2025

openai / openai-python

The official Python library for the OpenAI API

Python 28,670 4,265 Updated Sep 14, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,618 180 Updated Jun 25, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 58,011 10,112 Updated Sep 14, 2025

NVIDIA / TensorRT-Model-Optimizer

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…

Python 1,351 152 Updated Sep 13, 2025

Stability-AI / generative-models

Generative Models by Stability AI

Python 26,380 2,948 Updated May 20, 2025

CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,306 1,667 Updated Feb 29, 2024

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,616 3,658 Updated Sep 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yong Wu yongwww

Achievements

Achievements

Highlights

Organizations

Block or report yongwww

Stars

apache / tvm-ffi

NVIDIA / nvshmem

NVIDIA / tilus

ByteDance-Seed / Triton-distributed

facebookresearch / dlrm

mirage-project / mirage

uccl-project / uccl

volcengine / verl

mlc-ai / relax

sgl-project / sglang

dmlc / xgboost

deepseek-ai / FlashMLA

Genesis-Embodied-AI / Genesis

NVIDIA / cutlass

flashinfer-ai / flashinfer

deepseek-ai / DeepSeek-R1

mlc-ai / mlc-python

tile-ai / tilelang

mlc-ai / xgrammar

NVIDIA / TensorRT-Incubator

triton-lang / triton

zai-org / CogVideo

rust-lang / rust

openai / openai-python

FasterDecoding / Medusa

vllm-project / vllm

NVIDIA / TensorRT-Model-Optimizer

Stability-AI / generative-models

CompVis / latent-diffusion

apache / tvm