kuozhang

Follow

Kuo Zhang kuozhang

Follow

12 followers · 16 following

Achievements

Achievements

Stars

260 results for source starred repositories

fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Python 3,454 553 Updated May 16, 2024

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,051 518 Updated Mar 16, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 663 119 Updated Feb 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,631 276 Updated Mar 10, 2025

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,838 369 Updated Mar 21, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 5,417 520 Updated Mar 21, 2025

muriloboratto / NCCL

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

32 7 Updated Aug 28, 2023

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 312 45 Updated Mar 21, 2025

Azure / msccl

Microsoft Collective Communication Library

60 6 Updated Nov 23, 2024

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 42 4 Updated Jan 4, 2025

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 948 139 Updated Mar 21, 2025

aliyun / SimAI

C++ 450 62 Updated Mar 20, 2025

zugexiaodui / torch_flops

A library for calculating the FLOPs in the forward() process based on torch.fx

Python 99 4 Updated Sep 5, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,622 161 Updated Mar 20, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 40,229 6,616 Updated Dec 9, 2024

NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Jupyter Notebook 844 115 Updated Mar 21, 2025

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,419 2,740 Updated Mar 21, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,642 4,487 Updated Mar 21, 2025

lucidrains / speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

Python 249 20 Updated Dec 22, 2024

ssbuild / qwen_finetuning

qwen models finetuning

Python 93 9 Updated Mar 9, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 12,252 1,318 Updated Mar 21, 2025

Cambricon / triton-linalg

Development repository for the Triton-Linalg conversion

C++ 180 18 Updated Feb 7, 2025

microsoft / triton-shared

Shared Middle-Layer for Triton Compilation

MLIR 232 56 Updated Mar 11, 2025

mlfoundations / MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.

804 20 Updated Jul 31, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,442 256 Updated Mar 19, 2025

DerryHub / BEVFormer_tensorrt

BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).

Python 466 76 Updated Nov 20, 2023

benfred / py-spy

Sampling profiler for Python programs

Rust 13,408 450 Updated Feb 6, 2025

dabochen / spreadsheet-is-all-you-need

A nanoGPT pipeline packed in a spreadsheet

2,108 127 Updated Jun 17, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 1,704 165 Updated Mar 21, 2025

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,932 483 Updated Jan 27, 2025