Skip to content

Bug: llama-gbnf-validator parses grammar but gets a seg fault when validating an input string against the grammar #10321

Open
@nissenbenyitskhak

Description

@nissenbenyitskhak

What happened?

I ran./llama-gbnf-validator mygrammar.txt mytestprogram.txtand after checking the grammar itself, it started to parse the test file and it went into an infinite loop calling static void llama_grammar_advance_stack() and eventually blew up in tiny_malloc_from_free_list()

mygrammar.txt
mytestprogram.txt
llama-grammar.cpp.txt

I modified llama-grammar.cpp to add some console debug statements, so the line numbers in the stack trace may be off a bit from the version I used. See the attached file llama-grammar.cpp.txt for my minor changes.

I found numerous bugs and problems with the validator, including these:

  1. The infinite loop noted above for the grammar and test file provided above. This is the most serious.
  2. If I use the construct nts? or nts* or nts+ on the rhs of a rule in the grammar, where nts is a defined non-terminal symbol defined elsewhere, I get the error "Undefined rule" with no indication of what the rule is, or how or why it is was created as undefined. To fix it, I have to parenthesize the nts, e.g., (nts)?, on the right hand side of the rule being defined. Nowhere is it documented that nts? is not valid gbnf, and if it is, the validator should complain, rather than just producing an invalid grammar representation.
  3. The documentation for gbnf states that non-terminal symbols may consist of lower case letters and "-", e.g., "if-statement". If I use a camelCase nts, e.g., "ifStatement", it is accepted by the validator but the test file parser does not work properly (at least in some cases). So is a camelCase nts allowed? If it isn't, the validator should complain and state the bad nts it sees, rather than just producing an invalid grammar representation.
  4. The gbnf grammar seems to require that I sprinkle ws (whitespace) non-terminal symbols in seemingly random places throughout the grammar rules because there is apparently no notion of a lexical analyzer. This requires a very tedious trial-and-error process, because if I put unnecessary ws tokens in a rule, it prevents the rule from firing, and if I leave a necessary one out, it also prevents the rule from firing! If you cannot make it work like other bnf grammars, please at least document when one must add ws and when must not.

Name and Version

% ./llama-cli --version
version: 4075 (fb4a0ec)
built with Apple clang version 16.0.0 (clang-1600.0.26.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

tiny_malloc_from_free_list 0x00000001838043f0
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:729
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
................ 65,000 lines removed ...............
llama_grammar_advance_stack(const std::vector<> &, const std::vector<> &, std::vector<> &) llama-grammar.cpp:731
llama_grammar_accept(const std::vector<> &, const std::vector<> &, unsigned int, std::vector<> &) llama-grammar.cpp:864
[Inlined] llama_grammar_validate(llama_grammar *, const std::string &, unsigned long &, std::string &) gbnf-validator.cpp:21
main gbnf-validator.cpp:101

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions