Skip to content

Bug: JSON Schema-to-GBNF additionalProperties bugs (and other minor quirks) #7789

Closed
@HanClinto

Description

What happened?

While debugging json-schema-to-gbnf grammars, I noticed a few bugs / quirks and wanted to write them down somewhere.

additionalProperties seems to default to false (not matching spec).

By default, additional properties should be permitted. However, providing a schema like:

{
  "type": "object",
  "properties": {
    "number": { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
  }
}

Then it correctly passes on these strings:

{ "number": 1600, "street_name": "Pennsylvania", "street_type":"Avenue"}
{ "street_name": "Pennsylvania" }
{ "number": 1600, "street_name": "Pennsylvania" }
{}

But then it improperly fails on the string:

{ "number": 1600, "street_name": "Pennsylvania", "street_type":"Avenue", "direction":"NW"}

This is clearly given in the json-schema docs as an example of a string that should match this schema, so we're doing something wrong.

Explicit "additionalProperties"=true behavior is even worse.

If we change the above grammar to:

{
  "type": "object",
  "properties": {
    "number": { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
  },
  "additionalProperties": true
}

Then things really start to go awry. These strings should all pass (indeed, they passed before when we didn't explicitly set anything for additionalProperties, but instead are failing now:

{"number":1600,"street_name":"Pennsylvania","street_type":"Avenue"}
{ "street_name": "Pennsylvania" }
{ "number": 1600, "street_name": "Pennsylvania" }

And our sample with an additional property still doesn't match:

{ "number": 1600, "street_name": "Pennsylvania", "street_type":"Avenue", "direction":"NW"}

The only string that matches out of the original is the empty object ({}).

Looking at the generated GBNF, there is some weird stuff going on. Here is the GBNF with additionalProperties set implicitly:

char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
number-kv ::= "\"number\"" space ":" space number
number-rest ::= ( "," space street-name-kv )? street-name-rest
root ::= "{" space  (number-kv number-rest | street-name-kv street-name-rest | street-type-kv )? "}" space
space ::= " "?
street-name-kv ::= "\"street_name\"" space ":" space string
street-name-rest ::= ( "," space street-type-kv )?
street-type ::= "\"Street\"" | "\"Avenue\"" | "\"Boulevard\""
street-type-kv ::= "\"street_type\"" space ":" space street-type
string ::= "\"" char* "\"" space

And here is the GBNF from additionalProperties set explicitly to true:

additional-kv ::= string ":" space additional-value
additional-kvs ::= additional-kv ( "," space additional-kv )*
additional-value ::= object
array ::= "[" space ( value ("," space value)* )? "]" space
boolean ::= ("true" | "false") space
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
null ::= "null" space
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
number-kv ::= "\"number\"" space ":" space number
number-rest ::= ( "," space street-name-kv )? street-name-rest
object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
root ::= "{" space  (number-kv number-rest | street-name-kv street-name-rest | street-type-kv street-type-rest | additional-kvs )? "}" space
space ::= " "?
street-name-kv ::= "\"street_name\"" space ":" space string
street-name-rest ::= ( "," space street-type-kv )? street-type-rest
street-type ::= "\"Street\"" | "\"Avenue\"" | "\"Boulevard\""
street-type-kv ::= "\"street_type\"" space ":" space street-type
street-type-rest ::= additional-kvs
string ::= "\"" char* "\"" space
value ::= object | array | string | number | boolean | null

The key differences to note here are how street-type-rest is now being defined (even though it was never defined in the original), and additional-kvs seems to be getting appended to each property without a comma in between (nor an optional flag).

I haven't yet wrapped my brain around what all is going on with that, but I wanted to lay out how far I'd gotten on my own.

Unlike strings, enums don't support spaces between properties and values.

This is definitely in the "quirk" more than "bug" category, but when using a schema like:

{
  "type": "object",
  "properties": {
    "number": { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
  }
}

Then validating against it means that:

{ "number": 1600, "street_type":"Avenue"}

is a valid string, but adding spaces around the enum value causes either of the following to fail:

{ "number": 1600, "street_type": "Avenue"}
{ "number": 1600, "street_type":"Avenue" }

Interestingly, adding spaces around a string value works fine, and these match the generated grammar just fine:

{ "number": 1600, "street_name": "Pennsylvania" }

Unsupported Attributes

We should probably build a list of unsupported attributes and note them in the documentation -- some that I've noticed thus far:

  • exclusiveMinimum (probably can't be handled for anything except for special cases of 0, requiring either the presence of a - or not)
  • uniqueItems -- not sure how we could support this without a regex engine that supports capture groups and lookbehinds and whatnot.

Name and Version

version: 3093 (7672ade)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0

What operating system are you seeing the problem on?

Mac

Relevant log output

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedlow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions