-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uri format regexp is validating invalid URI #408
Comments
The regexp in question is this one
|
@vmaurin The regexp definitely has room to improve. The reason I didn't use the Java URI class to verify the URL is due to performance. Most users are using this library to validate the HTTP request and it needs to be as fast as possible. I would suggest improving the regexp to make it cover more validation scenarios. The other option is to provide both options but default to regexp and with a configuration flag switch to Java URL class. What do you think? |
I totally agree with the performance issue on relying on try/catch for validation, and also after a quick research, it sounds that URI java class is not perfect regarding validation either. My personal pick would be as you suggested, to enrich a bit the regexp. There are several example online, with this one as the most complete validation http://jmrware.com/articles/2009/uri_regexp/URI_regex.html but it might be also overkill. There are also some middle ground, like this one |
It looks good to me. Would you like to submit a PR to replace the old one? Thanks. |
The json schema uri format should follow rfc3986 for uri validation. I have replaced the current regex with a more restrictive one in order to have a stronger validation according the RFC refs networknt#408
I just did but I ran in an other use case. So far, the protocol relative URL were considered valid, that is not the case anymore with the new regex. According json schema, we should follow the URI rfc, then I guess forbid protocol relative URL, but not sure it is something you desire |
What is the protocol-relative URL? Could you please provide an example? Thanks. We want to make sure it is backward compatible if possible. Otherwise, some users will have a broken application after upgrade the library. |
It is actually a breaking change, see in my PR here https://github.com/networknt/json-schema-validator/pull/409/files#diff-cd18df339efd8831f215e37b5248aeb7dede17bac8013db8830dee30cd59b593L200 . The value asserted "valid" is not actually a valid URI, so there are two options :
|
@stevehu I am waiting on your call. Any more restrictive regexp won't be backward compatible by design, it will invalidate value that were considered valid before. Either you want the library to become stricter along time, ensuring good validation but no backward compatibility regarding validation, or you want to keep the library as lax as today, but give options to be more strict if needed. Depending on your decision, I will adapt the PR |
We want to maintain backward compatibility; however, we cannot stop enhancing the library. Please submit a PR with your updated regexp. Thanks. |
Ok so the PR is open here #409 |
The json schema uri format should follow rfc3986 for uri validation. I have replaced the current regex with a more restrictive one in order to have a stronger validation according the RFC refs #408 Co-authored-by: Vincent Maurin <vincent.maurin@vectaury.io>
Because I just ran into the issue, that google font URLs are considered invalid: The RFC says it is invalid, but seems like google didn't get the memo. |
If you get the address in the above URL, it is already encoded as a valid URL.
|
I right-clicked the URL and copied the link address. |
as stated: this is github encoding the URL. Use the mouse to mark the URL and copy it. |
I don't know who encoded the URL. What I mean is only the valid URL should be used in JSON schema. |
The regexp solution for validating a URI is not perfect here, but for sure a URI with |
yes, i have already created my own format, but what I am saying is: browsers AND the java implementation are not agreeing with the regex (which is RFC compliant from what I can see). As always when it comes to the web, the rules are not always applied. |
The regexp used by the "uri" format validation doesn't really validate that the string is a proper URI, i.e there is a chance passing a validated string to the java URI class will result into an exception
An example invalid URI that passes the validation is
https://gitlab.com/?args=invalid|value
Should the regexp be more complex ? Should the validation be made with the java URI class (then catching the exception to validate) ?
The text was updated successfully, but these errors were encountered: