Type specific validating formats (stringFormat, numberFormat) #1391
Description
The "format" keyword has historically changed functionality, it's gone back and forth from being a validation keyword, to annotation, back to a validation keyword if you specify (out-of-band) that it's a validation keyword. The fact this is specified out-of-band makes it impossible to determine the right way to "upgrade" the keyword between dialects of a schema.
This ambiguous behavior doesn't make a lot of sense, it seems to me there ought to be a keyword that's for annotation, and a keyword that's for validation.
When defining a validation "format" keyword, usually validation only happens when the instance is of one type or another (e.g. "minimum" doesn't do anything if the input is a boolean). If I'm using "format" as a validation keyword, and I want it to apply only to strings, I have to use more sophisticated logic. This won't work as expected:
{ "type": ["number", "string"], "format": "date" }
Number inputs will always fail. Instead, I have to write:
{ "oneOf": [ {"type": "number"}, { "type": "string", "format": "date" }] }
But this is complicated. I should be able to do something like:
{ "type": ["number", "string"], "stringFormat": "date" }
Then, these new type-specific formats could be validation keywords, leaving "format" to be an annotation-only keyword:
- There would be "stringFormat". I believe all of the existing formats are string formats.
- "numberFormat" would accept "integer" (no decimal point allowed, even
1.0
), "float", or "exponential" (e.g.1e5
) - Potentially "objectFormat" and "arrayFormat" because it might be useful for some niche applications
- (null/boolean don't need formats, being very small value spaces)
Some additional features:
-
The keyword would only be defined for values that the validator knows. That is, if an unknown value for the typed-format keywords was provided, it would fail the same way unknown keyword would.
-
A URI could be provided, to allow for one-off, user-defined formats that bypass standardization. For example, if I want to represent an ISO 8601 period, I could write down
{"stringFormat": "http://example.com/format/period"}
-
Formats could refer not just to standard syntaxes, but also references to outside validators, or nonstandard sets. e.g. I could write
{"numberFormat": "https://example.org/numberFormat/A000045"}
to refer to all numbers that are in the Fibonacci sequence. -
(as an idea) In the event a format is renamed, or a URI format is standardized, the typed-format keywords could accept a space-delimited list of format names or URI names; this would mean "all of these formats are the same, use any one that you understand." e.g. if the above period format gets standardized as "period", then you could write
{ "stringFormat": "period http://example.com/format/period" }
to indicate using either definition is OK, they're the same thing.
Blockers: This depends on "unknown keywords prohibited" being a feature, otherwise these proposed keywords will just be annotation keywords.
Metadata
Assignees
Labels
Type
Projects
Status
Awaiting PR