Add OneDNN or DirectML support

Currently the best results we can get with whisper.cpp is with Cuda (Nvidia) or CoreML (macOS).

On Windows there's only OpenBlas and it works slow, maybe 2 times of the duration of the audio (amd ryzen 5 4500u, medium model).
When using ctranslate2 on the same machine it works 2-3 times faster than the audio duration on CPU only!

Since recently whisper.cpp removed support for OpenCL, I think that it's important having good alternative to Windows users with Intel / AMD CPUs / TPUs.

There's few different options that can be added:
[oneDNN-ExecutionProvider.html](https://onnxruntime.ai/docs/execution-providers/oneDNN-ExecutionProvider.html)
[DirectML-ExecutionProvider.html](https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html)

In addition ctranslate2 uses [ruy](https://github.com/google/ruy)

Related: https://github.com/ggerganov/ggml/issues/406#issuecomment-2241707874

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OneDNN or DirectML support #2303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add OneDNN or DirectML support #2303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions