Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 10, 2025

Mirrored from ggml-org/llama.cpp#17136

Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.

cc @pwilkin

Problem

Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:

<tool_call>
<function={name}>
<parameter={arg-name}>
{arg_value as json or string}
</parameter>
...
</function>
</tool_call>

Supporting this format requires the parser to know the type of each argument based on the provided schema.

Proposal

I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.

Here's an example from test/test-chat-parser-combinator.cpp:

auto parser = build_parser([](parser_builder & p) {
    auto space = p.add_rule("space", p.space());

    auto reasoning = p.add_rule("reasoning",
        p.literal("<think>") + space +
        p.group("reasoning-content",
            p.zero_or_more(~(space + p.literal("</think>")) + p.any())) +
        space + p.literal("</think>"));

    auto content = p.add_rule("content",
        p.group("content",
            p.zero_or_more(~(space + p.literal("<tool_call>")) + p.any())));

    auto ident_chars = p.add_rule("ident-chars", p.char_class("[a-zA-Z\\-_]"));
    auto json = p.add_json_rule("json");

    auto tool_call_name = p.add_rule("tool-call-name",
        p.literal("<name>") + space +
        p.group("tool-name", p.one_or_more(~p.literal("</name>") + ident_chars)) +
        space + p.literal("</name>"));

    auto tool_call_args = p.add_rule("tool-call-args",
        p.literal("<args>") + space +
        p.group("tool-args", json) +
        space + p.literal("</args>"));

    auto tool_call = p.add_rule("tool-call",
        p.literal("<tool_call>") + space +
        tool_call_name + space +
        tool_call_args + space +
        p.literal("</tool_call>"));

    return p.add_rule("root", reasoning + p.optional(content) + p.optional(tool_call));
});

std::string input = R"(<think>I need to call get_weather with city = New York</think><tool_call><name>get_weather</name><args>{"city": "New York"}</args></tool_call>)";
parser_context ctx{input, parse_cache()};

auto result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(input.size(), result.end);
assert_equals(std::string("I need to call get_weather with city = New York"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"city": "New York"})"), *result.group("tool-args", ctx.input));

The parser supports partial parsing for streaming output:

input = R"(<think>I need to call get_weather</think><tool_call><name>get_weather</name><args>{"cit)";
ctx = parser_context{input, parse_cache(), /* .is_input_complete = */ false};
result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(std::string("I need to call get_weather"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"cit)"), *result.group("tool-args", ctx.input));

The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both tool_choice = auto and tool_choice = required.

Specifics

This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:

parser literal(const std::string & literal);
parser sequence(std::initializer_list<parser> parsers);
parser choice(std::initializer_list<parser> parsers);
parser one_or_more(const parser & p);
parser zero_or_more(const parser & p);
parser optional(const parser & p);
parser negate(const parser & p);
parser any();
parser char_class(const std::string & classes);
parser group(const std::string & name, const parser & p);
parser rule(const std::string & name);
parser space();

The operators +, |, and ~ construct sequence, choice, and negate parsers respectively.

Drawbacks

  • Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example, p.zero_or_more(~(space + p.literal("</think>")) + p.any()) matches any character that isn't followed by </think>. This can be generalized through an excluding() parser

  • Packrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity

  • Each model still requires a custom parser, though they share a common framework that simplifies implementation

  • Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls

To do

  • Basic implementation
  • Support parsing of partial input for streaming
  • Implement a JSON parser using parser combinators to replace the current healing system
  • Implement content() and reasoning() parsers to populate content/reasoning fields.
  • Implement tool(), tool_name(), tool_args(), as well as tool_arg_name() and tool_arg_value() for models such as Qwen3-Coder.
  • Construct GBNF grammar from the final parser
  • Implement json-schema-to-grammar support. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.
  • Allow building of the parser during chat param initialization.

@DajanaV DajanaV force-pushed the main branch 24 times, most recently from 930eefd to db9060f Compare November 12, 2025 23:09
@DajanaV DajanaV force-pushed the main branch 7 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15
@DajanaV DajanaV closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants