Skip to content

Performance e-core bug(?) - only 50% CPU utilization when using all threads - (Win11, Intel 13900k) #842

Closed
@cmp-nct

Description

@cmp-nct

I've not digged deep into this yet but my whole CPU utilization is only at 50%.
I've compiled it with current VS build tools, all default, release mode of course.

It might be related to the modern e-cores in Intel CPUs, they pack quite a punch but are weaker than performance cores.
In the graph it looks like 16 cores (the amount of e-cores) are much more utilized and 8 cores (amount of performance cores) are mostly idle despite using 24 threads. Increasing threads worsens performance, decreasing threads worsens tokens output.

I tested the small 7B model in 4 bit and 16 bit.
The only method to get CPU utilization above 50% is by using more than the total physical cores (like 32 cores).
In this case I see up to 99% CPU utilization but the token performance drops below 2 cores performance, some hyperthreading issue I suppose.
I tried various modes (small/large batch size, context size) It all does not influence it much.

The CPU was idle (as seen in screenshot).
Also memory is not full or swapping either.

Here is the command line: .\Release\main.exe -m .\models\7B\ggml-model-f16.bin -p "Below I count from 1 to 100000000: 1 2 3 4 5 6" -c 1024 -t 24 -n 1024 -b 64

system_info: n_threads = 24 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 1024, n_batch = 64, n_predict = 1024, n_keep = 0

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions