🎯
Focusing
Started a new repo only for open-source exploration. This repo doesn't represent LF's affiliations with company and school.
- Menlo Park
Pinned Loading
-
SageAttention
SageAttention PublicForked from thu-ml/SageAttention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.