We need xformer or flash-attention support for ‘mps’ devices, it can be speed up attention layer inference time 3-5 times !!!!