Skip to content
View kuozhang's full-sized avatar

Block or report kuozhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
11 stars written in Cuda
Clear filter

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,054 518 Updated Mar 16, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,445 256 Updated Mar 22, 2025

CUDA Library Samples

Cuda 1,838 369 Updated Mar 21, 2025

NCCL Tests

Cuda 1,041 266 Updated Mar 15, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 663 119 Updated Feb 21, 2025

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 608 140 Updated Jan 16, 2025

CUDA Kernel Benchmarking Library

Cuda 595 72 Updated Mar 12, 2025

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 368 74 Updated Sep 8, 2024

CUTLASS and CuTe Examples

Cuda 42 4 Updated Jan 4, 2025