kuozhang

Kuo Zhang kuozhang

11 followers · 16 following

Achievements

Stars

11 stars written in Cuda

Clear filter

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,109 536 Updated Mar 28, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,520 261 Updated Mar 29, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,377 461 Updated Jan 16, 2024

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,849 373 Updated Mar 21, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,045 268 Updated Mar 15, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 667 119 Updated Feb 21, 2025

tensorflow / recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 609 141 Updated Mar 26, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 596 72 Updated Mar 12, 2025

baidu-research / baidu-allreduce

Cuda 580 115 Updated Apr 6, 2018

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 374 76 Updated Sep 8, 2024

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 43 4 Updated Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kuo Zhang kuozhang

Achievements

Achievements

Block or report kuozhang

Stars

deepseek-ai / DeepGEMM

flashinfer-ai / flashinfer

Tony-Tan / CUDA_Freshman

NVIDIA / CUDALibrarySamples

NVIDIA / nccl-tests

NVIDIA / multi-gpu-programming-models

tensorflow / recommenders-addons

NVIDIA / nvbench

baidu-research / baidu-allreduce

Bruce-Lee-LY / cuda_hgemm

leimao / CUTLASS-Examples