Open
Description
What happened?
I ran./llama-gbnf-validator mygrammar.txt mytestprogram.txt
and after checking the grammar itself, it started to parse the test file and it went into an infinite loop calling static void llama_grammar_advance_stack()
and eventually blew up in tiny_malloc_from_free_list()
mygrammar.txt
mytestprogram.txt
llama-grammar.cpp.txt
I modified llama-grammar.cpp to add some console debug statements, so the line numbers in the stack trace may be off a bit from the version I used. See the attached file llama-grammar.cpp.txt for my minor changes.
I found numerous bugs and problems with the validator, including these:
- The infinite loop noted above for the grammar and test file provided above. This is the most serious.
- If I use the construct
nts?
ornts*
ornts+
on the rhs of a rule in the grammar, wherents
is a defined non-terminal symbol defined elsewhere, I get the error "Undefined rule" with no indication of what the rule is, or how or why it is was created as undefined. To fix it, I have to parenthesize the nts, e.g.,(nts)?
, on the right hand side of the rule being defined. Nowhere is it documented thatnts?
is not valid gbnf, and if it is, the validator should complain, rather than just producing an invalid grammar representation. - The documentation for gbnf states that non-terminal symbols may consist of lower case letters and "-", e.g., "if-statement". If I use a camelCase nts, e.g., "ifStatement", it is accepted by the validator but the test file parser does not work properly (at least in some cases). So is a camelCase nts allowed? If it isn't, the validator should complain and state the bad nts it sees, rather than just producing an invalid grammar representation.
- The gbnf grammar seems to require that I sprinkle
ws
(whitespace) non-terminal symbols in seemingly random places throughout the grammar rules because there is apparently no notion of a lexical analyzer. This requires a very tedious trial-and-error process, because if I put unnecessaryws
tokens in a rule, it prevents the rule from firing, and if I leave a necessary one out, it also prevents the rule from firing! If you cannot make it work like other bnf grammars, please at least document when one must addws
and when must not.
Name and Version
% ./llama-cli --version
version: 4075 (fb4a0ec)
built with Apple clang version 16.0.0 (clang-1600.0.26.4) for arm64-apple-darwin23.6.0
What operating system are you seeing the problem on?
Mac
Relevant log output
tiny_malloc_from_free_list 0x00000001838043f0
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:729
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
................ 65,000 lines removed ...............
llama_grammar_advance_stack(const std::vector<…> &, const std::vector<…> &, std::vector<…> &) llama-grammar.cpp:731
llama_grammar_accept(const std::vector<…> &, const std::vector<…> &, unsigned int, std::vector<…> &) llama-grammar.cpp:864
[Inlined] llama_grammar_validate(llama_grammar *, const std::string &, unsigned long &, std::string &) gbnf-validator.cpp:21
main gbnf-validator.cpp:101