#
cutlass
Here are 8 public repositories matching this topic...
GEMM and Winograd based convolutions using CUTLASS
-
Updated
Jul 15, 2020 - Cuda
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
gpu
cuda
inference
nvidia
cutlass
mha
multi-head-attention
llm
tensor-core
large-language-model
flash-attention
flash-attention-2
-
Updated
Sep 7, 2024 - C++
study of cutlass
-
Updated
Aug 31, 2023 - Cuda
Multiple GEMM operators are constructed with cutlass to support LLM inference.
-
Updated
Sep 27, 2024 - C++
pytorch implements block sparse
-
Updated
May 13, 2023 - C++
-
Updated
Nov 2, 2023 - Python
Improve this page
Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."