still low performances with CoreML model (using ANE) compared to Vosk (Kaldi) using CPU

Hi,

using the large CoreML encoder  provided by huggingface I still have performances very low compared to Vosk with Kaldi and I don't get why.

when I run it:

whisper_init_state: Core ML model loaded
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 1 | OPENVINO = 0 | 

so everything is finely set and in theory I should use ANE that should be really performant.

for converting 3h audio (16khz and 1channel) it took 1h, while with Vosk using Kaldi (I use only the CPU), same quality but it took 11min, how is it possible? Am I missing something?

Thank you
Luca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

still low performances with CoreML model (using ANE) compared to Vosk (Kaldi) using CPU #1301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

still low performances with CoreML model (using ANE) compared to Vosk (Kaldi) using CPU #1301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions