Skip to content

Support for Fused Attention + FP8 #560

Open
@vkuzo

Description

@vkuzo

from @drisspg

Summary

Cuddn supports fused_attention with FP8 inputs.
See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8

There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:pytorch/pytorch#101916

This seems like a potential path attention support for FP8

copied from pytorch-labs/float8_experimental#111

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions