Description
I've not digged deep into this yet but my whole CPU utilization is only at 50%.
I've compiled it with current VS build tools, all default, release mode of course.
It might be related to the modern e-cores in Intel CPUs, they pack quite a punch but are weaker than performance cores.
In the graph it looks like 16 cores (the amount of e-cores) are much more utilized and 8 cores (amount of performance cores) are mostly idle despite using 24 threads. Increasing threads worsens performance, decreasing threads worsens tokens output.
I tested the small 7B model in 4 bit and 16 bit.
The only method to get CPU utilization above 50% is by using more than the total physical cores (like 32 cores).
In this case I see up to 99% CPU utilization but the token performance drops below 2 cores performance, some hyperthreading issue I suppose.
I tried various modes (small/large batch size, context size) It all does not influence it much.
The CPU was idle (as seen in screenshot).
Also memory is not full or swapping either.
Here is the command line: .\Release\main.exe -m .\models\7B\ggml-model-f16.bin -p "Below I count from 1 to 100000000: 1 2 3 4 5 6" -c 1024 -t 24 -n 1024 -b 64
system_info: n_threads = 24 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 1024, n_batch = 64, n_predict = 1024, n_keep = 0