Skip to content

shenjy0829/kernel-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kernel-bench

For self-learning purposes ~

  • Implementation
    • Pytorch
    • CUDA
    • Cute DSL
    • triton
    • tilelang
  • to do kernel
    • Reduction

    • Prefix Sum

    • Top K Selection

    • K-Means Clustering

    • Elementwise

      • ???
    • GEMM

      • GEMM
      • SGEMM
    • Attention

      • flash-attention v1
      • flash-attention v2
      • flash-attention v3
      • flash-attention v4
      • Multi-Head Attention
    • Multi-Agent Simulation

    • LDPC

    • FFT

  • Done kernel +

Usage

Setup Env

mamba create --name kernel_bench python=3.11

## cuda toolkit and dsl
# cuda
mamba install cuda-nvcc
# torch & triton
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130   
#  Cute DSL
pip install nvidia-cutlass-dsl


## else
mamba install colorama
mamba install loguru plotly pandas click


## install LazyGPU
cd utils
pip install -e .

Performance

  • NVIDIA GeForce RTX 4090

  • NVIDIA A100-SXM4-40GB

  • NVIDIA 5080

References

The implementation of this benchmark has benefited from the following sources:

About

For self-learning purposes ~

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published