Closed
Description
With the recent support for running convolutions on the GPU (#4060) we should be able to offload CLIP to run fully on the GPU.
- Implement
ggml_acc
CUDA / Metal kernels - Avoid
ggml_repeat
where possible using broadcast - Should use the new
ggml-backend
API (see https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/main-backend.cpp)