kuozhang

Kuo Zhang kuozhang

12 followers · 16 following

Achievements

Stars

11 stars written in Cuda

Clear filter

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,054 518 Updated Mar 16, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,445 256 Updated Mar 22, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,368 459 Updated Jan 16, 2024

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,838 369 Updated Mar 21, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,041 266 Updated Mar 15, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 663 119 Updated Feb 21, 2025

tensorflow / recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 608 140 Updated Jan 16, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 595 72 Updated Mar 12, 2025

baidu-research / baidu-allreduce

Cuda 580 115 Updated Apr 6, 2018

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 368 74 Updated Sep 8, 2024

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 42 4 Updated Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kuo Zhang kuozhang

Achievements

Achievements

Block or report kuozhang

Stars

deepseek-ai / DeepGEMM

flashinfer-ai / flashinfer

Tony-Tan / CUDA_Freshman

NVIDIA / CUDALibrarySamples

NVIDIA / nccl-tests

NVIDIA / multi-gpu-programming-models

tensorflow / recommenders-addons

NVIDIA / nvbench

baidu-research / baidu-allreduce

Bruce-Lee-LY / cuda_hgemm

leimao / CUTLASS-Examples