Skip to content

Type specific validating formats (stringFormat, numberFormat) #1391

Open
@awwright

Description

The "format" keyword has historically changed functionality, it's gone back and forth from being a validation keyword, to annotation, back to a validation keyword if you specify (out-of-band) that it's a validation keyword. The fact this is specified out-of-band makes it impossible to determine the right way to "upgrade" the keyword between dialects of a schema.

This ambiguous behavior doesn't make a lot of sense, it seems to me there ought to be a keyword that's for annotation, and a keyword that's for validation.

When defining a validation "format" keyword, usually validation only happens when the instance is of one type or another (e.g. "minimum" doesn't do anything if the input is a boolean). If I'm using "format" as a validation keyword, and I want it to apply only to strings, I have to use more sophisticated logic. This won't work as expected:

{ "type": ["number", "string"], "format": "date" }

Number inputs will always fail. Instead, I have to write:

{ "oneOf": [ {"type": "number"}, { "type": "string", "format": "date" }] }

But this is complicated. I should be able to do something like:

{ "type": ["number", "string"], "stringFormat": "date" }

Then, these new type-specific formats could be validation keywords, leaving "format" to be an annotation-only keyword:

  • There would be "stringFormat". I believe all of the existing formats are string formats.
  • "numberFormat" would accept "integer" (no decimal point allowed, even 1.0), "float", or "exponential" (e.g. 1e5)
  • Potentially "objectFormat" and "arrayFormat" because it might be useful for some niche applications
  • (null/boolean don't need formats, being very small value spaces)

Some additional features:

  1. The keyword would only be defined for values that the validator knows. That is, if an unknown value for the typed-format keywords was provided, it would fail the same way unknown keyword would.

  2. A URI could be provided, to allow for one-off, user-defined formats that bypass standardization. For example, if I want to represent an ISO 8601 period, I could write down {"stringFormat": "http://example.com/format/period"}

  3. Formats could refer not just to standard syntaxes, but also references to outside validators, or nonstandard sets. e.g. I could write {"numberFormat": "https://example.org/numberFormat/A000045"} to refer to all numbers that are in the Fibonacci sequence.

  4. (as an idea) In the event a format is renamed, or a URI format is standardized, the typed-format keywords could accept a space-delimited list of format names or URI names; this would mean "all of these formats are the same, use any one that you understand." e.g. if the above period format gets standardized as "period", then you could write { "stringFormat": "period http://example.com/format/period" } to indicate using either definition is OK, they're the same thing.


Blockers: This depends on "unknown keywords prohibited" being a feature, otherwise these proposed keywords will just be annotation keywords.

Related: #1383, #1284

Metadata

Assignees

No one assigned

    Labels

    proposalInitial discussion of a new idea. A project will be created once a proposal document is created.

    Type

    No type

    Projects

    • Status

      Awaiting PR

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions