cutlass

Here are 4 public repositories matching this topic...

A fast communication-overlapping library for tensor parallelism on GPUs.

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Multiple GEMM operators are constructed with cutlass to support LLM inference.

pytorch implements block sparse

Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.

To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."