Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Precompile metal kernels into .metallib files #2335

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zackangelo
Copy link
Contributor

I'm running into an issue where the first time I call apply_repeat_penalty, it takes a very long time (in excess of 6 seconds). It seems to be coming from the Tensor::to_vec1d call to move the logits into a Vec<f32>. It seems like a simple copy like this would be very fast.

It was suggested on Discord that this slowness might be due to some other async stuff happening in Metal, maybe the compilation of the kernels on first load.

This PR precompiles the kernels at build time instead of on every run. Unfortunately, it doesn't seem to solve my problem but it might be useful for other reasons.

@zackangelo
Copy link
Contributor Author

zackangelo commented Jul 15, 2024

Looking at #2322, will likely need to reconfigure the build script to optionally produce iOS .metallibs

@zackangelo
Copy link
Contributor Author

@LaurentMazare is this something you think you would want to potentially merge? if so I can clean it up. Otherwise, we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant