Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hostname format validation problem #208

Open
soslanco opened this issue Jul 29, 2018 · 15 comments
Open

Hostname format validation problem #208

soslanco opened this issue Jul 29, 2018 · 15 comments
Labels

Comments

@soslanco
Copy link

soslanco commented Jul 29, 2018

Regular expression in hostname format doesn't comply specification from json-schema.org.

Following example from json-schema.org doesn't work because ipv4 addresses detect as valid hostnames too.

"oneOf": [
          { "format": "hostname" },
          { "format": "ipv4" },
          { "format": "ipv6" }
]

@johandorland
Copy link
Collaborator

So the JSON schema spec references RFC 1034 which does not allow for (sub)domains to start with a digit. In practice however there numerous domains that start with a digit and RFC 1123 actually allows starting digits in (sub)domains, but that leads to the problem that IP addresses are suddenly also valid domain names. The regex currently used allows for (sub)domains to start with a digit and is therefore indeed strictly speaking not RFC 1034 compliant.

I'm not sure how to tackle this problem. It's a bit embarrassing that an example from the json-schema.org site doesn't work as expected, but I think breaking the validation of domain names that use RFC 1123 is also less than ideal as they're quite common. Another workaround is to explicitly check for an ipv4 address and fail the hostname format validation if it's also an ipv4 address.

@handrews I could really use your opinion on this one.

@handrews
Copy link

@johandorland yeah, JSON Schema can't really fix ambiguities and imperfect real-world conformance with regards to other RFCs. This is part of why format is essentially best-effort, a situation that I hope to clarify in the next draft so fewer people expect it to work perfectly. For whatever definition of "work" they are using, which may not be the same as the next person's expectation of "work" for exactly this sort of reason.

@soslanco I would use anyOf instead of oneOf for that specific example. In general, to get non-ip-address hostnames, I would do something like:

{
   "format": "hostname",
   "not": {"format": "ipv4"}
}

which I suppose you could also use in a oneOf if you really need oneOf.

@soslanco if you want to file an issue on that example at https://github.com/json-schema-org/json-schema-org.github.io/issues that would be great.

@soslanco
Copy link
Author

anyOf work fine, but json schema specification say:

hostname:
As defined by RFC 1034, section 3.1 [RFC1034], including host names produced using the Punycode algorithm specified in RFC 5891, section 4.4 [RFC5891].

imho behavior of hostname format must be as specified.

@soslanco
Copy link
Author

for example if specification say:

foobar:
As defined by RFC 1034, section 3.1 [RFC1034], including host names produced using the Punycode algorithm specified in RFC 5891, section 4.4 [RFC5891].

then how must foobar work?
;-)

@handrews
Copy link

@soslanco neither the JSON Schema specification nor JSON Schema implementations can fix ambiguous and conflicting specifications. The problem here is that the internet does not entirely follow RFC 1034, so implementations need to decide what the right trade-off is.

See also the section on implementation requirements for format which acknowledge that it is often difficult or even impossible (see: email) to guarantee correct validation. I should probably add to that section in the next draft, because people keep expecting format to provide guarantees. It does not. It defers the amount of support to the implementation due to the very complicated tradeoffs involved.

@soslanco
Copy link
Author

soslanco commented Aug 1, 2018

Why IPv4 address is hostname, but IPv6 not?

@handrews
Copy link

handrews commented Aug 1, 2018

It's a coincidence of syntax. IPv4 and hostnames are both dot-separated, but IPv6 is colon separated.

None of this is specific to JSON Schema at all.

@soslanco
Copy link
Author

soslanco commented Aug 1, 2018

rfc3986

A host identified by an IPv6 literal address is represented inside the square brackets without a preceding version flag.

A host identified by an IPv4 literal address is represented in dotted-decimal notation (a sequence of four decimal numbers in the range 0 to 255, separated by ".")

A host identified by a registered name is a sequence of characters usually intended for lookup within a locally defined host or service name registry

@soslanco
Copy link
Author

soslanco commented Aug 1, 2018

One of solution is adding strict mode for full compatibility with specification.

@handrews
Copy link

handrews commented Aug 1, 2018

As one of the two primary editors of the JSON Schema specification, I am going to state this unambiguously: This Go implementation is in conformance with the JSON Schema specification as long as it allows every valid hostname. It is expected that formats with ambiguous, difficult, or conflicting rules will not necessarily catch every semantically invalid value. The implementation should document all known limitations with each supported format, but when it comes to format (and in draft-07, contentMediaType and contentEncoding), there is no such thing as strict mode or full compatibility. The implementation burden would be too high.

And with that I will leave the rest of this issue's resolution to @johandorland . If we need to improve the examples on the web site, please file an issue there.

@johandorland
Copy link
Collaborator

After sleeping a night on it I'm going to mark this as won't fix. RFC 1034 is outdated and does not reflect real world use. Clearly I'm not alone on this as with one exception every other JSON schema implementation I checked also does not follow RFC 1034 strictly and also allows ipv4 addresses as hostnames.

I'm also not a big fan of a strict mode, @handrews hit the nail on the head quite well on that issue.

@johandorland johandorland added wontfix and removed bug labels Aug 1, 2018
@soslanco
Copy link
Author

soslanco commented Aug 1, 2018

The best one solution is changing specification :-)

@handrews
Copy link

handrews commented Aug 1, 2018

@soslanco I invite you to go to the IETF and have fun trying to change RFC 1034. Please stop asking us to change things that are not under our control.

@soslanco
Copy link
Author

soslanco commented Aug 2, 2018

Thanks, but i am talking about JSON Schema specification.

hostname:
As defined by RFC 1034, section 3.1 [RFC1034], including host names produced using the Punycode algorithm specified in RFC 5891, section 4.4 [RFC5891].

P.S. One of (specification or software) are wrong.

@soslanco
Copy link
Author

soslanco commented Aug 3, 2018

But this validator (gojsonschema) the best one what i used! ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants