ggml-cpu: Add operator-level execution time profiling #17657

kimminsu38oo · 2025-12-01T15:52:40Z

This PR adds operator-level profiling to the ggml-cpu backend.

Key Changes

Compile Option: Added GGML_CPU_OP_PROFILING to enable this feature.
Output: Saves operator execution times in ms to op_profiling.csv
Thread Safety: Implemented synchronization barriers to ensure accurate timing in multi-threaded environments.
Performance
Negligible runtime overhead.

Example Output

am17an · 2025-12-01T15:59:17Z

Have you tested the effect on overall runtime? Hard to believe that writing to a file and flushing after every op completion has "negligible run-time overhead"

kimminsu38oo · 2025-12-01T18:08:10Z

@am17an

Thanks for the feedback. I just ran a benchmark to verify the impact.

Test Env: Intel Xeon E-2388G (8 threads), Prefill/Decode 256 tokens.

Model: LLaMA3.2-3B_Q4_0

Results

Original

Profiling on

I observed a latency increased of about 78ms for prefill and 742ms for decode.

You were right that the overhead isn't negligible. However, considering the total end-to-end runtime, it might not be a significant amount.

kimminsu38oo · 2025-12-01T18:44:22Z

@am17an
I also tested on a mobile device.

Test Env: Galaxy S24 Ultra (Snapdragon 8gen3) (thread 6), Prefill/Decode 256 tokens.

Model: LLaMA3.2-3B_Q4_0

Results

Original

Profiling on

Counterintuitively, the profiling overhead was smaller on the mobile device, even with its more constrained memory bandwidth.

I observed a difference of 61ms for prefill and 217ms for decode.

am17an · 2025-12-02T04:49:57Z

You can see per function times way better using a proper profiler (like Intel vTune or AMD uProf, or for GPU there is Nsight). Adding an ad-hoc csv file does not make sense, we already have test-backend-ops which tests perf for individual operations in a much more statistically viable way. As such this change does not make sense

kimminsu38oo · 2025-12-02T09:37:24Z

@am17an

Thanks for the feedback, and thank you for taking an interest in this.

As you mentioned, profiling on Desktop is possible using tools like VTune.
However, when I initially wrote this code, I wanted to perform operator-level breakdown profiling on mobile, and profiling in the mobile environment was quite tricky. (It might be due to my lack of skills. I tried using Android Profiler but failed, and other profilers required a rooted device.)

Also, ggml-opencl provides operator-level profiling, and this motivated me to write the corresponding code as well.

But upon reflection, I realize that such an implementation could clutter the codebase in llama.cpp, which supports multiple backends (such as CUDA and Intel CPU). Thank you for your feedback."

ggml-cpu: add CPU operation-wise profiling for operator-level timing

64ebee2

kimminsu38oo requested a review from ggerganov as a code owner December 1, 2025 15:52

loci-dev mentioned this pull request Dec 1, 2025

UPSTREAM PR #17657: ggml-cpu: Add operator-level execution time profiling auroralabs-loci/llama.cpp#393

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: Add operator-level execution time profiling #17657

ggml-cpu: Add operator-level execution time profiling #17657

kimminsu38oo commented Dec 1, 2025

Uh oh!

am17an commented Dec 1, 2025 •

edited

Loading

Uh oh!

kimminsu38oo commented Dec 1, 2025 •

edited

Loading

Uh oh!

kimminsu38oo commented Dec 1, 2025 •

edited

Loading

Uh oh!

am17an commented Dec 2, 2025 •

edited

Loading

Uh oh!

kimminsu38oo commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-cpu: Add operator-level execution time profiling #17657

Are you sure you want to change the base?

ggml-cpu: Add operator-level execution time profiling #17657

Conversation

kimminsu38oo commented Dec 1, 2025

Uh oh!

am17an commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kimminsu38oo commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kimminsu38oo commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kimminsu38oo commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

am17an commented Dec 1, 2025 •

edited

Loading

kimminsu38oo commented Dec 1, 2025 •

edited

Loading

kimminsu38oo commented Dec 1, 2025 •

edited

Loading

am17an commented Dec 2, 2025 •

edited

Loading

kimminsu38oo commented Dec 2, 2025 •

edited

Loading