Closed
Description
GPU inference on Apple Silicon via Metal backend was recently added to llama.cpp
: ggml-org/llama.cpp#1642
We should port the changes to whisper.cpp
and allow the Decoder to run on the GPU in a similar way
GPU inference on Apple Silicon via Metal backend was recently added to llama.cpp
: ggml-org/llama.cpp#1642
We should port the changes to whisper.cpp
and allow the Decoder to run on the GPU in a similar way