Question about the design philosophy: Manual Graph Construction vs. DL Compilers #17674

kimminsu38oo · 2025-12-02T03:18:59Z

kimminsu38oo
Dec 2, 2025

Hi, I’m relatively new to this project, so please forgive me if I’m asking something obvious.

From what I understand, llama.cpp works by manually constructing the computation graph (e.g., via build_graph in ggml) and invoking operators one by one. This seems different from the approach used by Deep Learning Compilers like TVM or XLA, which automate graph optimization and operator fusion.

If my understanding is correct, I have a few questions regarding the limitations of this "manual" approach:

New Architectures: Does this mean developers have to manually implement the graph code for every new model architecture supported?
Optimization: Without automatic operator fusion, isn't there a performance overhead? For backends like ggml-opencl, are kernels just executed sequentially without being fused?

I’m curious why llama.cpp chose this manual implementation strategy over a compiler-based approach. Are there any plans to introduce compiler-like optimizations in the future?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the design philosophy: Manual Graph Construction vs. DL Compilers #17674

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Question about the design philosophy: Manual Graph Construction vs. DL Compilers #17674

Uh oh!

kimminsu38oo Dec 2, 2025

Replies: 0 comments

kimminsu38oo
Dec 2, 2025