Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#17136

Supporting new models requires implementing several features:

  • Lazy grammar for tool calling (tool_choice = auto)
  • Full grammar for forced tool calls and response_format (reasoning models)
  • Parallel tool calls support
  • Parsing of reasoning and tool call outputs

For reasoning models, the grammar must include reasoning or performance degrades significantly.

The real challenge is that each model uses a different output format:

  • Harmony response output (gpt-oss)
  • XML with typed parameters (Qwen3-Coder, MiniMax M2)
    • These models expect string arguments as raw content rather than JSON, which requires type awareness at parse time.
  • Pseudo-function call (LFM2 e.g. [get_weather(location="..."), ...])

Currently, the grammar and parsing exist as separate functions, which works but feels a bit fragile. I believe we can unify the two by using parser combinators to compose a PEG parser. That way the grammar definition becomes the parser.

Proposed Solution

This PR introduces a generic PEG (Parsing Expression Grammar) parser to the common library, along with chat-specific extensions and a complete reference implementation for Qwen3-Coder.

I've noticed there's often a lag between when a model is supported by llama.cpp and when proper tool calling is fully implemented. This parser aims to close that gap by letting you define the grammar and parser at the same time, making it easier to add full tool calling support for new models.

Parsing Expression Grammars (PEG)

PEG parsers are straightforward to implement as recursive descent parsers. While recursive descent parsers are known for backtracking, the majority of model output can be parsed with minimal backtracking, making them practical for this use case.

Parser combinators allow us to compose complex parsers from simple, reusable building blocks. This creates a DSL that closely mimics the grammar itself.

Rather than defining both a grammar and parsing function, we can build a PEG parser that generates a compatible GBNF grammar (with exceptions) and parses model output.

Features

  • Partial parsing for streaming input
  • Built-in JSON parsers for common patterns
  • Grammar generation for generating compatible GBNF grammars
  • AST generation with semantic tags for structured extraction
  • Three common AST shapes covering most model formats:
    • simple - Content with optional reasoning
    • native - Tool arguments as JSON objects
    • constructed - Tool arguments as separate entities (XML or pseudo-functions)

Examples

Parser for models that emit tool arguments as JSON
auto parser = build_chat_peg_native_parser([&](common_chat_peg_native_builder & p) {
    // Build choice of available tools
    auto tool_choice = p.choice();
    for (const auto & tool : tools) {
        const auto & function = tool.at("function");
        std::string name = function.at("name");
        const auto & schema = function.at("parameters");

        auto tool_name = p.json_member("name", "\"" + p.literal(name) + "\"");
        auto tool_args = p.json_member("arguments", p.schema(p.json(), "tool-" + name + "-schema", schema));

        tool_choice |= p.rule("tool-" + name, "{" << tool_name << "," << tool_args << "}");
    }

    // Define tool call structure
    auto tool_call = p.trigger_rule("tool-call",
        p.sequence({
            p.literal("<tool_call>["),
            tool_choice,
            p.literal("]</tool_call>")
        })
    );

    return p.sequence({
        p.content(p.until("<tool_call>")),
        p.optional(tool_call),
        p.end()
    });
});
Parser for models that emit XML tags for each argument
auto parser = build_chat_peg_constructed_parser([&](common_chat_peg_constructed_builder & p) {
    auto location_arg = p.tool_arg(
        p.tool_arg_open("<parameter name=\"" + p.tool_arg_name(p.literal("location")) + "\">"),
        p.tool_arg_string_value(p.until("</parameter>")),
        p.tool_arg_close(p.literal("</parameter>"))
    );

    auto get_weather_tool = p.tool(p.sequence({
        p.tool_open("<function name=\"" + p.tool_name(p.literal("get_weather")) + "\">"),
        location_arg,
        p.tool_close(p.literal("</function>"))
    }));

    return p.sequence({
        p.content(p.until("<tool_call>")),
        p.literal("<tool_call>"),
        get_weather_tool,
        p.literal("</tool_call>"),
        p.end()
    });
});
Grammar generation
data.grammar = build_grammar([&](const common_grammar_builder & builder) {
    foreach_function(params.tools, [&](const json & fn) {
        builder.resolve_refs(fn.at("parameters"));
    });
    parser.build_grammar(builder, data.grammar_lazy);
});

Implementation Details

The PEG parsers are implemented using std::variant rather than traditional inheritance. This reduces boilerplate and leverages std::visit for type-safety. I initially had an OOP implementation, but it started becoming quite cumbersome and this seems like the lesser evil of the two.

using common_peg_parser_variant = std::variant<
    common_peg_epsilon_parser,
    common_peg_start_parser,
    common_peg_end_parser,
    common_peg_literal_parser,
    common_peg_sequence_parser,
    common_peg_choice_parser,
    common_peg_repetition_parser,
    common_peg_and_parser,
    common_peg_not_parser,
    common_peg_any_parser,
    common_peg_space_parser,
    common_peg_chars_parser,
    common_peg_json_string_parser,
    common_peg_until_parser,
    common_peg_schema_parser,
    common_peg_rule_parser,
    common_peg_ref_parser,
    common_peg_atomic_parser,
    common_peg_tag_parser
>;

Both parsers and AST nodes are allocated in arena structures to minimize memory allocations.

class common_peg_arena {
    std::vector<common_peg_parser_variant> parsers_;
    std::unordered_map<std::string, common_peg_parser_id> rules_;
    common_peg_parser_id root_ = COMMON_PEG_INVALID_PARSER_ID;
    ...

class common_peg_ast_arena {
    std::vector<common_peg_ast_node> nodes_;
    ...

Each parser variant is wrapped in a common_peg_parser value type to produce a DSL for composing parser combinators.

Parsers can return results FAIL, SUCCESS, or NEED_MORE_INPUT. This is how the partial parsing is implemented. It does not raise an exception on partial parse like common/chat-parser.cpp, because partial parses are still valid for streaming.

Additional Changes

  • Added common_chat_peg_parse() to common/chat.cpp and chat formats COMMON_CHAT_FORMAT_PEG_(SIMPLE|NATIVE|CONSTRUCTED) to support models parsed by a PEG parser.
    • The parser must be passed from chat param initialization to the parse function. To do this, I currently serialize the parser to JSON and then deserialize to common_chat_syntax.parser. I'm not a fan, but this seems the least intrusive method to integrate. I'll implement any alternative mechanisms if desired.
  • Added common/unicode.{cpp,h} derived from src/unicode.{cpp,h}. As I understand, we should not include headers from src/, so I had to copy the implementation. It does deviate by returning a result rather than raising an exception.

More comprehensive documentation is added in docs/development/parsing.md. The tests are also fairly comprehensive, tests/test-chat-peg-parser.cpp.


I know this is a big PR. I tried to minimize the implementation, while keeping enough to demonstrate value. #15703 shows community desire for something like this, although it doesn't have to be this implementation.

@loci-dev loci-dev force-pushed the main branch 30 times, most recently from c7d40d0 to 9182b13 Compare December 28, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants