Does this work for mt5 architectiture?

Hi,
  First of all, great work. I am big proponent of FLan-t5 and use it in my projects. For multilingual, mT5 and bigscience/mt0 models provide a good baseline and are truly multilingual. Does Flash Attention work on mt5 architecture? Seems like only T5 is supported now?

https://huggingface.co/bigscience/mt0-large is something I am looking at which is based on mT5

Thanks for the great work