Skip to content

still low performances with CoreML model (using ANE) compared to Vosk (Kaldi) using CPU #1301

Closed
@xcottos

Description

@xcottos

Hi,

using the large CoreML encoder provided by huggingface I still have performances very low compared to Vosk with Kaldi and I don't get why.

when I run it:

whisper_init_state: Core ML model loaded
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 1 | OPENVINO = 0 |

so everything is finely set and in theory I should use ANE that should be really performant.

for converting 3h audio (16khz and 1channel) it took 1h, while with Vosk using Kaldi (I use only the CPU), same quality but it took 11min, how is it possible? Am I missing something?

Thank you
Luca

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions