Skip to content

DD-DuDa/Cute-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cute-Learning

Welcome to the Cute-Learning repository! This project showcases several example implementations using Cutlass CuTe, a powerful tool for high-performance computing.

Features

This repository includes implementations for:

  • GEMM (General Matrix Multiply)
  • GEMV (General Matrix-Vector Multiply)
  • Flash-Decoding
  • Data Copy
  • LDSM (ldmatrix instruction)
  • Tensor Dequant
  • TODO... (More features to come!)

GEMM

The GEMM implementation is optimized for performance. Below is a performance graph showcasing its efficiency:

GEMM Performance

Refer to the following blog:

LDSM

Refer to the following blog:


We hope you find this repository useful for your learning and development needs. Contributions and feedback are welcome!

About

Examples of CUDA implementations by Cutlass CuTe

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published