Support for Fused Attention + FP8

### Summary

Cuddn supports fused_attention with FP8 inputs. 
See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8

There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:https://github.com/pytorch/pytorch/pull/101916

This seems like a potential path attention support for FP8