Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xformers or flash-attention Support ? #225

Open
Johnson-yue opened this issue Aug 1, 2023 · 2 comments
Open

xformers or flash-attention Support ? #225

Johnson-yue opened this issue Aug 1, 2023 · 2 comments

Comments

@Johnson-yue
Copy link

We need xformer or flash-attention support for ‘mps’ devices, it can be speed up attention layer inference time 3-5 times !!!!

@philipturner
Copy link

He's talking about metal-flash-attention which surpassed Apple ML Stable Diffusion in terms of performance.

@BuildBackBuehler
Copy link

BuildBackBuehler commented Dec 8, 2023

He's talking about metal-flash-attention which surpassed Apple ML Stable Diffusion in terms of performance.

Thank you for your work on that =) – looks like something I'd like to incorporate in whatever I can. Is there any way I can patch in flash-attention? I was figuring at least with attention.metal(?)

I'm also curious how well, if at all, it could be integrated into MLX. If you haven't seen/heard about it yet

https://github.com/ml-explore/mlx/issues

I planned on integrating it into my ComfyUI StableDiff. instance in-lieu of PyTorch. I don't know if I'm just too novice to know any shortcuts but it seems like there's no way around manually parsing/editing relevant files 1 definition at a time.

But I'm figuring that unlike MLX vs. Pytorch, MFA probably uses the exact same vars/defs as MPS and thus it could just be implemented either atop MPS or in-lieu of it without the headache

Edit: Also, FYI as per the MLX devs (link below) they're very encouraging of any contributions that speed it up, so I figure MFA has a much better chance of implementation in that.

ml-explore/mlx#40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants