Open
Description
from @drisspg
Summary
Cuddn supports fused_attention with FP8 inputs.
See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8
There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:pytorch/pytorch#101916
This seems like a potential path attention support for FP8
copied from pytorch-labs/float8_experimental#111