WIP: Precompile metal kernels into `.metallib` files #2335

zackangelo · 2024-07-15T18:38:35Z

I'm running into an issue where the first time I call apply_repeat_penalty, it takes a very long time (in excess of 6 seconds). It seems to be coming from the Tensor::to_vec1d call to move the logits into a Vec<f32>. It seems like a simple copy like this would be very fast.

It was suggested on Discord that this slowness might be due to some other async stuff happening in Metal, maybe the compilation of the kernels on first load.

This PR precompiles the kernels at build time instead of on every run. Unfortunately, it doesn't seem to solve my problem but it might be useful for other reasons.

zackangelo · 2024-07-15T19:08:42Z

Looking at #2322, will likely need to reconfigure the build script to optionally produce iOS .metallibs

zackangelo · 2024-07-20T15:34:48Z

@LaurentMazare is this something you think you would want to potentially merge? if so I can clean it up. Otherwise, we can close it.

zackangelo added 2 commits July 15, 2024 13:35

metal: precompile kernels

4878b86

re-run build script if metal kernel changes

ac1cb58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Precompile metal kernels into `.metallib` files #2335

WIP: Precompile metal kernels into `.metallib` files #2335

zackangelo commented Jul 15, 2024

zackangelo commented Jul 15, 2024 •

edited

Loading

zackangelo commented Jul 20, 2024

WIP: Precompile metal kernels into .metallib files #2335

Are you sure you want to change the base?

WIP: Precompile metal kernels into .metallib files #2335

Conversation

zackangelo commented Jul 15, 2024

zackangelo commented Jul 15, 2024 • edited Loading

zackangelo commented Jul 20, 2024

WIP: Precompile metal kernels into `.metallib` files #2335

WIP: Precompile metal kernels into `.metallib` files #2335

zackangelo commented Jul 15, 2024 •

edited

Loading