-
Notifications
You must be signed in to change notification settings - Fork 14.3k
common : introduce composable PEG parser combinators for chat parsing #17136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs? |
Sure. I've never managed permissions on a GitHub repo, but let me know if you can't push. The interface isn't solidified, so hammer away. I do want to clean up the header and move stuff into the source file. Figured I'd handle that as I get further along. The partial parsing works, but does require careful attention if editing. The idea is to "succeed" if the parse tree is partially traversed and the input is marked as incomplete. With some caveats: if a literal is partially matched, it will propagate a result indicating we need more input. I intend to add a I need to clean up the caching. Initially, I thought, maybe we could reuse the cache as we get more and more input. I'm finding it very difficult to find the correct time to cache. So I'm thinking about nixing that idea and just provide a cache per parsing run--as the packrat algorithm originally intended. Then we can profile if caching is beneficial or not on a real example. I suspect there shouldn't be a whole lot of backtracking, so the memory cost might not be worth it if the gains are minuscule. |
|
Aight, let me bounce my original idea - what if we just created a GBNF parser builder and used that to parse the messages? Then we have both problems (tool call / reasoning and compatibility with normal parsing) done in one go. Unless (haven't looked into it) it would just be too inefficient for normal content parsing? Because right now it feels like we're adding another intermediate abstraction while GBNF is already implemented in GGML - so maybe just use a builder as an abstraction layer to create all the needed objects and add any missing partial parse support? This is just an idea, not very fixated on it, just thought I'd share it. Regarding memory coatsnand the packrat parser, I think O(n) with typical LLM inputs is negligible, even with super long contexts we're looking at like a few MB overhead at most. |
|
Sounds like you're thinking of a parser generator. Something like yacc, bison, or ANTLR. The problem I see with those solutions is they require building a parse table upfront, which is less intuitive than building a parse tree such as in this PR. You could create a recursive descent parser but that would have to be done at compile time. If you did it at runtime, I think the solution would look a lot like this! I haven't examined the GBNF code with a scalpel, but taking a brief look it seems like it uses a pushdown automata and may be challenging to extract content. Not that we would want to, since it is part of the core and not common. I believe there is a desire to keep the chat parsing isolated in common. I also think you lose the expressiveness of being able to define the grammar in C++. For example, with this solution we could add a The solutions I mentioned above do this by defining their own language to insert code--not pretty in my experience. That said, I am open to ideas. If you have a clearer picture of what that looks like, I'm happy to review. I understand inserting a new abstraction is a tough ask. I wanted to roll out a PoC to hopefully show value. |
|
@aldehir Nah, you're probably right. I looked at the GBNF code and in fact it would take too much effort to extract the parsed content from there. We're better off just doing it your way. I'll try to code some of the missing pieces. |
|
@pwilkin great! If you have any questions, feel free to ask. |
|
Aight, I'm done with the hybrid ops and convert_hf_to_gguf refactoring cleanup, so I'll probably finally look at this tomorrow :> |
|
No rush. I am getting closer to a set of parsing functions that I'm happy with. The unfortunate part is I had to roll specialized parsers to maintain comparable performance with the existing parsing. A lexer would likely help, but optimized parsers for certain use cases is enough for now. I added a benchmark in the test that implements the Command R2B parser, and compares it to the existing one. It seemed like a good one to illustrate. The existing parsing has a leg up with JSON. That said, it's still a fraction of a millisecond for a full prompt. I think most of the cost will go into the constrained decoding anyway. I'll have to benchmark larger JSON documents. Worst case, we can fall back to the implementation in |
|
I have pushed the Qwen3-Coder implementation to a separate branch in my repo: https://github.com/aldehir/llama.cpp/tree/qwen3-coder-peg-parser The helpers and any testing recommendations should probably go in a separate PR with the first usage of this new parsing mechanism (if approved). Those tests would most likely impact I don't view the existing mechanism as a buggy mess, but I do believe there is an opportunity for improvement by aligning the grammar and parsing. |
|
@ggerganov @CISC @ngxson Please take a look. Curious whether you think something like this would be a good fit. |
|
@aldehir Nah, I agree that this has become large enough as it is that any implementations and helpers are best saved for another PR. If you don't mind, I can actually open it after this is merged, will be easier to show you my vision that way ;) |
|
I really like the idea overall. Actually, before minja was introduced to the project, I envisioned a parser with a syntax like this: parser << "<" << name "=" << value << ">";Which is pretty close to the example in your documentation: tool_choice |= p.rule("tool-" + name, "{" << tool_name << "," << tool_args << "}");Honestly this system is more readable and more robust than the regex-based approach current used by most parsers in One questions though: Do you think we can potentially replace the recently-merged XML parser with this system? IMO the current XML parser is over-complicated because (unfortunately) it has to rely on the GBNF grammar system |
Sure thing.
The XML parsing is what inspired this PR. As model outputs become more complex and different, I think we'll need more sophisticated parsing than regex can provide. So, yes, I envision this replacing the recently merged XML parsing. This system still relies on GBNF grammar for constrained decoding, but it produces compatible GBNF grammars from the parser instead of handwriting one. I am exploring an alternative sampler that uses this parser implementation during sampling instead (albeit, with a more optimized underlying implementation for incremental parsing). I'll need to roll out a prototype to better understand its performance characteristics, but it feels like a viable, alternative, option. |
Replacing all regex-based parser with PEG seems to be a logical next step, what's current holding it back? |
Nothing! I have already demonstrated use with Qwen3-Coder and it works really well. Where @pwilkin and I have differing opinions is in what should be a utility function, but we'll iron that out as we go. There are a few edge cases in the |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aldehir Looks like nice work!
Thank you! I added myself to CODEOWNERS for the related files. I also updated the description in light of the recent AI disclosure policy. |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ngxson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good jobs @aldehir !
|
Let's merge so we can start building upon it :) Kudos to @aldehir for all his hard work here. |
…ggml-org#17136) * common : implement parser combinators to simplify chat parsing * add virtual destructor to parser_base * fix memory leak from circular references of rules * implement gbnf grammar building * remove unused private variable * create a base visitor and implement id assignment as a visitor * fix const ref for grammar builder * clean up types, friend classes, and class declarations * remove builder usage from until_parser * Use a counter class to help assign rule ids * cache everything * add short description for each parser * create a type for the root parser * implement repetition parser * Make optional, one_or_more, and zero_or_more subclasses of repetition * improve context constructor * improve until parsing and add benchmarks * remove cached() pattern, cache in parser_base with specialized parsing functions for each parser * improve json parsing performance to better match legacy parsing * fix const auto * it for windows * move id assignment to classes instead of using a visitor * create named rules in the command r7b example * use '.' for any in GBNF * fix parens around choices in gbnf grammar * add convenience operators to turn strings to literals * add free-form operators for const char * to simplify defining literals * simplify test case parser * implement semantic actions * remove groups in favor of actions and a scratchpad * add built in actions for common operations * add actions to command r7b example * use std::default_searcher for platforms that don't have bm * improve parser_type handling and add cast helper * add partial result type to better control when to run actions * fix bug in until() * run actions on partial results by default * use common_chat_msg for result * add qwen3 example wip * trash partial idea and simplify * move action arguments to a struct * implement aho-corasick matcher for until_parser and to build exclusion grammars * use std::string for input, since std::string_view is incompatible with std::regex * Refactor tests * improve qwen3 example * implement sax-style parsing and refactor * fix json string in test * rename classes to use common_chat_ prefix * remove is_ suffix from functions * rename from id_counter to just counter * Final refactored tests * Fix executable name and editorconfig-checker * Third time's the charm... * add trigger parser to begin lazy grammar rule generation * working lazy grammar * refactor json rules now that we check for reachability * reduce pointer usage * print out grammars in example * rename to chat-peg-parser* and common_chat_peg_parser* * Revert unrelated changes * New macros for CMakeLists to enable multi-file compilations * starting unicode support * add unicode support to char_parser * use unparsed args as additional sources * Refactor tests to new harness * Fix CMakeLists * fix rate calculation * add unicode tests * fix trailing whitespace and line endings skip-checks: true * Helpers + rewrite qwen3 with helpers * Fix whitespace * extract unicode functions to separate file * refactor parse unicode function * fix compiler error * improve construction of sequence/choice parsers * be less clever * add make_parser helper function * expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source * lower bench iterations * add unicode support to until_parser * add unicode support to json_string_parser * clean up unicode tests * reduce unicode details to match src/unicode.cpp * simplify even further * remove unused functions * fix type * reformat char class parsing * clean up json string parser * clean up + fix diagnostics * reorder includes * compact builder functions * replace action_parser with capture_parser, rename env to semantics * rename env to semantics * clean up common_chat_parse_context * move type() to below constant * use default constructor for common_chat_peg_parser * make all operators functions for consistency * fix compilation errors in test-optional.cpp * simplify result values * rename json_string_unquoted to json_string_content * Move helper to separate class, add separate explicit and helper classes * Whitespace * Change + to append() * Reformat * Add extra helpers, tests and Minimax example * Add some extra optional debugging prints + real example of how to use them * fix bug in repetitions when min_count = 0 reports failures * dump rule in debug * fix token accumulation and assert parsing never fails * indent debug by depth * use LOG_* in tests so logs sync up with test logs * - Add selective testing - Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture * refactor rule() and introduce ref() * clean up visitor * clean up indirection in root parser w.r.t rules * store shared ptr directly in parser classes * replace aho-corasick automation with a simple trie * Reset prev for qwen3 helper example variant * refactor to use value semantics with std::variant/std::visit * simplify trie_matcher result * fix linting issues * add annotations to rules * revert test workaround * implement serializing the parser * remove redundant parsers * remove tests * gbnf generation fixes * remove LOG_* use in tests * update gbnf tests to test entire grammar * clean up gbnf generation and fix a few bugs * fix typo in test output * remove implicit conversion rules * improve test output * rename trie_matcher to trie * simplify trie to just know if a node is the end of a word * remove common_chat_ prefix and ensure a common_peg_ prefix to all types * rename chat-peg-parser -> peg-parser * promote chat-peg-parser-helper to chat-peg-parser * checkpoint * use a static_assert to ensure we handle every branch * inline trivial peg parser builders * use json strings for now * implement basic and native chat peg parser builders/extractors * resolve refs to their rules * remove packrat caching (for now) * update tests * compare parsers with incremental input * benchmark both complete and incremental parsing * add raw string generation from json schema * add support for string schemas in gbnf generation * fix qwen example to include \n * tidy up example * rename extractor to mapper * rename ast_arena to ast * place basic tests into one * use gbnf_format_literal from json-schema-to-grammar * integrate parser with common/chat and server * clean up schema and serialization * add json-schema raw string tests * clean up json creation and remove capture parser * trim spaces from reasoning and content * clean up redundant rules and comments * rename input_is_complete to is_partial to match rest of project * simplify json rules * remove extraneous file * remove comment * implement += and |= operators * add comments to qwen3 implementation * reorder arguments to common_chat_peg_parse * remove commented outdated tests * add explicit copy constructor * fix operators and constness * wip: update test-chat for qwen3-coder * bring json parser closer to json-schema-to-grammar rules * trim trailing space for most things * fix qwen3 coder rules w.r.t. trailing spaces * group rules * do not trim trailing space from string args * tweak spacing of qwen3 grammar * update qwen3-coder tests * qwen3-coder small fixes * place parser in common_chat_syntax to simplify invocation * use std::set to collect rules to keep order predictable for tests * initialize parser to make certain platforms happy * revert back to std::unordered_set, sort rule names at the end instead * uncomment rest of chat tests * define explicit default constructor * improve arena init and server integration * fix chat test * add json_member() * add a comprehensive native example * clean up example qwen test and add response_format example to native test * make build_peg_parser accept std::function instead of template * change peg parser parameters into const ref * push tool call on tool open for constructed parser * add parsing documentation * clean up some comments * add json schema support to qwen3-coder * add id initializer in tests * remove grammar debug line from qwen3-coder * refactor qwen3-coder to use sequence over operators * only call common_chat_peg_parse if appropriate format * simplify qwen3-coder space handling * revert qwen3-coder implementation * revert json-schema-to-grammar changes * remove unnecessary forward declaration * small adjustment to until_parser * rename C/C++ files to use dashes * codeowners : add aldehir to peg-parser and related files --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
…ggml-org#17136) * common : implement parser combinators to simplify chat parsing * add virtual destructor to parser_base * fix memory leak from circular references of rules * implement gbnf grammar building * remove unused private variable * create a base visitor and implement id assignment as a visitor * fix const ref for grammar builder * clean up types, friend classes, and class declarations * remove builder usage from until_parser * Use a counter class to help assign rule ids * cache everything * add short description for each parser * create a type for the root parser * implement repetition parser * Make optional, one_or_more, and zero_or_more subclasses of repetition * improve context constructor * improve until parsing and add benchmarks * remove cached() pattern, cache in parser_base with specialized parsing functions for each parser * improve json parsing performance to better match legacy parsing * fix const auto * it for windows * move id assignment to classes instead of using a visitor * create named rules in the command r7b example * use '.' for any in GBNF * fix parens around choices in gbnf grammar * add convenience operators to turn strings to literals * add free-form operators for const char * to simplify defining literals * simplify test case parser * implement semantic actions * remove groups in favor of actions and a scratchpad * add built in actions for common operations * add actions to command r7b example * use std::default_searcher for platforms that don't have bm * improve parser_type handling and add cast helper * add partial result type to better control when to run actions * fix bug in until() * run actions on partial results by default * use common_chat_msg for result * add qwen3 example wip * trash partial idea and simplify * move action arguments to a struct * implement aho-corasick matcher for until_parser and to build exclusion grammars * use std::string for input, since std::string_view is incompatible with std::regex * Refactor tests * improve qwen3 example * implement sax-style parsing and refactor * fix json string in test * rename classes to use common_chat_ prefix * remove is_ suffix from functions * rename from id_counter to just counter * Final refactored tests * Fix executable name and editorconfig-checker * Third time's the charm... * add trigger parser to begin lazy grammar rule generation * working lazy grammar * refactor json rules now that we check for reachability * reduce pointer usage * print out grammars in example * rename to chat-peg-parser* and common_chat_peg_parser* * Revert unrelated changes * New macros for CMakeLists to enable multi-file compilations * starting unicode support * add unicode support to char_parser * use unparsed args as additional sources * Refactor tests to new harness * Fix CMakeLists * fix rate calculation * add unicode tests * fix trailing whitespace and line endings skip-checks: true * Helpers + rewrite qwen3 with helpers * Fix whitespace * extract unicode functions to separate file * refactor parse unicode function * fix compiler error * improve construction of sequence/choice parsers * be less clever * add make_parser helper function * expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source * lower bench iterations * add unicode support to until_parser * add unicode support to json_string_parser * clean up unicode tests * reduce unicode details to match src/unicode.cpp * simplify even further * remove unused functions * fix type * reformat char class parsing * clean up json string parser * clean up + fix diagnostics * reorder includes * compact builder functions * replace action_parser with capture_parser, rename env to semantics * rename env to semantics * clean up common_chat_parse_context * move type() to below constant * use default constructor for common_chat_peg_parser * make all operators functions for consistency * fix compilation errors in test-optional.cpp * simplify result values * rename json_string_unquoted to json_string_content * Move helper to separate class, add separate explicit and helper classes * Whitespace * Change + to append() * Reformat * Add extra helpers, tests and Minimax example * Add some extra optional debugging prints + real example of how to use them * fix bug in repetitions when min_count = 0 reports failures * dump rule in debug * fix token accumulation and assert parsing never fails * indent debug by depth * use LOG_* in tests so logs sync up with test logs * - Add selective testing - Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture * refactor rule() and introduce ref() * clean up visitor * clean up indirection in root parser w.r.t rules * store shared ptr directly in parser classes * replace aho-corasick automation with a simple trie * Reset prev for qwen3 helper example variant * refactor to use value semantics with std::variant/std::visit * simplify trie_matcher result * fix linting issues * add annotations to rules * revert test workaround * implement serializing the parser * remove redundant parsers * remove tests * gbnf generation fixes * remove LOG_* use in tests * update gbnf tests to test entire grammar * clean up gbnf generation and fix a few bugs * fix typo in test output * remove implicit conversion rules * improve test output * rename trie_matcher to trie * simplify trie to just know if a node is the end of a word * remove common_chat_ prefix and ensure a common_peg_ prefix to all types * rename chat-peg-parser -> peg-parser * promote chat-peg-parser-helper to chat-peg-parser * checkpoint * use a static_assert to ensure we handle every branch * inline trivial peg parser builders * use json strings for now * implement basic and native chat peg parser builders/extractors * resolve refs to their rules * remove packrat caching (for now) * update tests * compare parsers with incremental input * benchmark both complete and incremental parsing * add raw string generation from json schema * add support for string schemas in gbnf generation * fix qwen example to include \n * tidy up example * rename extractor to mapper * rename ast_arena to ast * place basic tests into one * use gbnf_format_literal from json-schema-to-grammar * integrate parser with common/chat and server * clean up schema and serialization * add json-schema raw string tests * clean up json creation and remove capture parser * trim spaces from reasoning and content * clean up redundant rules and comments * rename input_is_complete to is_partial to match rest of project * simplify json rules * remove extraneous file * remove comment * implement += and |= operators * add comments to qwen3 implementation * reorder arguments to common_chat_peg_parse * remove commented outdated tests * add explicit copy constructor * fix operators and constness * wip: update test-chat for qwen3-coder * bring json parser closer to json-schema-to-grammar rules * trim trailing space for most things * fix qwen3 coder rules w.r.t. trailing spaces * group rules * do not trim trailing space from string args * tweak spacing of qwen3 grammar * update qwen3-coder tests * qwen3-coder small fixes * place parser in common_chat_syntax to simplify invocation * use std::set to collect rules to keep order predictable for tests * initialize parser to make certain platforms happy * revert back to std::unordered_set, sort rule names at the end instead * uncomment rest of chat tests * define explicit default constructor * improve arena init and server integration * fix chat test * add json_member() * add a comprehensive native example * clean up example qwen test and add response_format example to native test * make build_peg_parser accept std::function instead of template * change peg parser parameters into const ref * push tool call on tool open for constructed parser * add parsing documentation * clean up some comments * add json schema support to qwen3-coder * add id initializer in tests * remove grammar debug line from qwen3-coder * refactor qwen3-coder to use sequence over operators * only call common_chat_peg_parse if appropriate format * simplify qwen3-coder space handling * revert qwen3-coder implementation * revert json-schema-to-grammar changes * remove unnecessary forward declaration * small adjustment to until_parser * rename C/C++ files to use dashes * codeowners : add aldehir to peg-parser and related files --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
Supporting new models requires implementing several features:
tool_choice = auto)response_format(reasoning models)For reasoning models, the grammar must include reasoning or performance degrades significantly.
The real challenge is that each model uses a different output format:
[get_weather(location="..."), ...])Currently, the grammar and parsing exist as separate functions, which works but feels a bit fragile. I believe we can unify the two by using parser combinators to compose a PEG parser. That way the grammar definition becomes the parser.
Proposed Solution
This PR introduces a generic PEG (Parsing Expression Grammar) parser to the common library, along with chat-specific extensions and a complete reference implementation for Qwen3-Coder.
I've noticed there's often a lag between when a model is supported by llama.cpp and when proper tool calling is fully implemented. This parser aims to close that gap by letting you define the grammar and parser at the same time, making it easier to add full tool calling support for new models.
Parsing Expression Grammars (PEG)
PEG parsers are straightforward to implement as recursive descent parsers. While recursive descent parsers are known for backtracking, the majority of model output can be parsed with minimal backtracking, making them practical for this use case.
Parser combinators allow us to compose complex parsers from simple, reusable building blocks. This creates a DSL that closely mimics the grammar itself.
Rather than defining both a grammar and parsing function, we can build a PEG parser that generates a compatible GBNF grammar (with exceptions) and parses model output.
Features
simple- Content with optional reasoningnative- Tool arguments as JSON objectsconstructed- Tool arguments as separate entities (XML or pseudo-functions)Examples
Parser for models that emit tool arguments as JSON
Parser for models that emit XML tags for each argument
Grammar generation
Implementation Details
The PEG parsers are implemented using
std::variantrather than traditional inheritance. This reduces boilerplate and leveragesstd::visitfor type-safety. I initially had an OOP implementation, but it started becoming quite cumbersome and this seems like the lesser evil of the two.using common_peg_parser_variant = std::variant< common_peg_epsilon_parser, common_peg_start_parser, common_peg_end_parser, common_peg_literal_parser, common_peg_sequence_parser, common_peg_choice_parser, common_peg_repetition_parser, common_peg_and_parser, common_peg_not_parser, common_peg_any_parser, common_peg_space_parser, common_peg_chars_parser, common_peg_json_string_parser, common_peg_until_parser, common_peg_schema_parser, common_peg_rule_parser, common_peg_ref_parser, common_peg_atomic_parser, common_peg_tag_parser >;Both parsers and AST nodes are allocated in arena structures to minimize memory allocations.
Each parser variant is wrapped in a
common_peg_parservalue type to produce a DSL for composing parser combinators.Parsers can return results
FAIL,SUCCESS, orNEED_MORE_INPUT. This is how the partial parsing is implemented. It does not raise an exception on partial parse likecommon/chat-parser.cpp, because partial parses are still valid for streaming.Additional Changes
common_chat_peg_parse()tocommon/chat.cppand chat formatsCOMMON_CHAT_FORMAT_PEG_(SIMPLE|NATIVE|CONSTRUCTED)to support models parsed by a PEG parser.common_chat_syntax.parser. I'm not a fan, but this seems the least intrusive method to integrate. I'll implement any alternative mechanisms if desired.common/unicode.{cpp,h}derived fromsrc/unicode.{cpp,h}. As I understand, we should not include headers fromsrc/, so I had to copy the implementation. It does deviate by returning a result rather than raising an exception.More comprehensive documentation is added in
docs/development/parsing.md. The tests are also fairly comprehensive,tests/test-chat-peg-parser.cpp.I know this is a big PR. I tried to minimize the implementation, while keeping enough to demonstrate value. #15703 shows community desire for something like this, although it doesn't have to be this implementation.
Note
I wrote the majority of the parsing myself, as well as the complete examples in
test-chat-peg-parser.cpp. AI assistance was used in some portions, notably: