Skip to content
View kuozhang's full-sized avatar

Block or report kuozhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
11 stars written in Cuda
Clear filter

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,109 536 Updated Mar 28, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,520 261 Updated Mar 29, 2025

CUDA Library Samples

Cuda 1,849 373 Updated Mar 21, 2025

NCCL Tests

Cuda 1,045 268 Updated Mar 15, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 667 119 Updated Feb 21, 2025

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 609 141 Updated Mar 26, 2025

CUDA Kernel Benchmarking Library

Cuda 596 72 Updated Mar 12, 2025

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 374 76 Updated Sep 8, 2024

CUTLASS and CuTe Examples

Cuda 43 4 Updated Jan 4, 2025