UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP] #153
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mirrored from ggml-org/llama.cpp#17136
Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.
cc @pwilkin
Problem
Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:
Supporting this format requires the parser to know the type of each argument based on the provided schema.
Proposal
I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.
Here's an example from
test/test-chat-parser-combinator.cpp:The parser supports partial parsing for streaming output:
The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both
tool_choice = autoandtool_choice = required.Specifics
This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:
The operators
+,|, and~constructsequence,choice, andnegateparsers respectively.Drawbacks
Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example,
p.zero_or_more(~(space + p.literal("</think>")) + p.any())matches any character that isn't followed by</think>. This can be generalized through anexcluding()parserPackrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls
To do
content()andreasoning()parsers to populate content/reasoning fields.tool(),tool_name(),tool_args(), as well astool_arg_name()andtool_arg_value()for models such as Qwen3-Coder.json-schema-to-grammarsupport. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.