This repository was archived by the owner on Aug 7, 2024. It is now read-only.
This repository was archived by the owner on Aug 7, 2024. It is now read-only.
Support for Fused Attention + FP8 #111
Closed
Description
Summary
Cuddn supports fused_attention with FP8 inputs.
See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8
There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:pytorch/pytorch#101916
This seems like a potential path attention support for FP8