-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate url.parse API in favor of URL constructor #11
base: master
Are you sure you want to change the base?
Conversation
index.js
Outdated
if (!containsProtocol) result.protocol = '' | ||
const urlInstance = new URL(containsProtocol ? url : 'invalid://' + url) | ||
const result = {} | ||
for (const prop in urlInstance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferred to use for..in
loop to not have to maintain a list of URL properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run tests, but I believe you can just pass back the urlInstance
here. No need to copy properties over to another object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that the URL
instance is resistant to having the .protocol
and .href
changed.
> result.href = result.href.replace('invalid://', '')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
at __node_internal_captureLargerStackTrace (node:internal/errors:478:5)
at new NodeError (node:internal/errors:387:5)
at URL.onParseError (node:internal/url:565:9)
at URL.set href [as href] (node:internal/url:755:5) {
input: '//my/path',
code: 'ERR_INVALID_URL'
}
What about pulling the invalid://
string out to a const INVALID_PROTO, and then in the calling code change this:
if (url.protocol) result.protocol = url.protocol
to this:
if (url.protocol !== INVALID_PROTO) result.protocol = url.protocol
Then one should be able to avoid copying the URL properties to a new object.
This could be improved upon a little bit to not be susceptible to user code actually using "invalid://" as a proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised other parts of the logic rely on the mutability of the URL object properties to resolve some values like the protocol.
So either we complicate the main logic or we keep the property copy. Performance wise I guess 1st option would be the one to pick.
This could be improved upon a little bit to not be susceptible to user code actually using "invalid://" as a proto.
Do you mean to use a value which is longer and more unlikely to be used by consumers? Something like original-url-protocol:
or something of the sorts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some changes following your advice @trentm
- changed logic to detect have a placeholder proto and detect it into the main logic. No need to copy properties
- add a try/catch logic to fallback to legacy parse if URL throws
Something I've noticed in the tests is that parse
method return more properties in the result object like slashes
and auth
. IMO this is a breaking change and needs to be reflected in the version with a major bump.
index.js
Outdated
if (!containsProtocol) result.protocol = '' | ||
const urlInstance = new URL(containsProtocol ? url : 'invalid://' + url) | ||
const result = {} | ||
for (const prop in urlInstance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run tests, but I believe you can just pass back the urlInstance
here. No need to copy properties over to another object.
index.js
Outdated
@@ -92,7 +92,14 @@ function getFirstHeader (req, header) { | |||
|
|||
function parsePartialURL (url) { | |||
const containsProtocol = url.indexOf('://') !== -1 | |||
const result = parseUrl(containsProtocol ? url : 'invalid://' + url) | |||
if (!containsProtocol) result.protocol = '' | |||
const urlInstance = new URL(containsProtocol ? url : 'invalid://' + url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could throw where url.parse
would not throw before, e.g.:
> new URL('invalid://@@')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
at __node_internal_captureLargerStackTrace (node:internal/errors:478:5)
at new NodeError (node:internal/errors:387:5)
at URL.onParseError (node:internal/url:565:9)
at new URL (node:internal/url:641:5) {
input: 'invalid://@@',
code: 'ERR_INVALID_URL'
}
> url.parse('invalid://@@')
Url {
protocol: 'invalid:',
slashes: true,
auth: '@',
host: '',
port: null,
hostname: '',
hash: null,
search: null,
query: null,
pathname: null,
path: null,
href: 'invalid://'
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I think the legacy parse
method is much more permissive since it consists on a loop trying to figure out as much as possible from the input string. You can look at the implementation
This means there will be more cases where the WHATWG URL API would simply throw since the input does not conform to its specs.
I guess the best approach to do not drop support is to catch the error and fallback to the legacy so we ensure support to package consumers which make use of this edge cases.
index.js
Outdated
if (!containsProtocol) result.protocol = '' | ||
const urlInstance = new URL(containsProtocol ? url : 'invalid://' + url) | ||
const result = {} | ||
for (const prop in urlInstance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that the URL
instance is resistant to having the .protocol
and .href
changed.
> result.href = result.href.replace('invalid://', '')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
at __node_internal_captureLargerStackTrace (node:internal/errors:478:5)
at new NodeError (node:internal/errors:387:5)
at URL.onParseError (node:internal/url:565:9)
at URL.set href [as href] (node:internal/url:755:5) {
input: '//my/path',
code: 'ERR_INVALID_URL'
}
What about pulling the invalid://
string out to a const INVALID_PROTO, and then in the calling code change this:
if (url.protocol) result.protocol = url.protocol
to this:
if (url.protocol !== INVALID_PROTO) result.protocol = url.protocol
Then one should be able to avoid copying the URL properties to a new object.
This could be improved upon a little bit to not be susceptible to user code actually using "invalid://" as a proto.
For the record I did run the benchmarks and compare between current implementation and the new one. It is noticeable that WHATWG URL is slower in parsing. Current implementation with
New implementation with
|
Which version of Node.js did you benchmark? There were some recent performance improvements that landed 3 weeks ago in v19.7.0: |
Node version used for benchmarks was v16.13.1. It's clear the improvements in v19.7.0 but still a bit higher than
|
[I'm pulling this out of the review comment so the conversation doesn't get broken up and more easily lost.]
Yes, that is what I had meant. It isn't very satisfying, however. :) In the nodejs-land issue that @styfle referred to, there was mention of a separate https://www.npmjs.com/package/request-target package for handling parsing a The parsing of the other headers (
Yes, agreed. I'm not opposed to a new major rev for this package. It would then be nice to then get specific about what fields of the returned object are promised -- as opposed to the currently looser "The result object will also contain any other property normally returned by the Node.js core url.parse function." |
I've checked https://www.npmjs.com/package/request-target package and it's using a set of
|
Created a gist to compare different scenarios and how different parsers (url.parse, new URL and request-target) behave. The most restrictive is request-target and url.parse seems to accept anything. Parse results for /
Parse results for /some/path
Parse results for //some/path
Parse results for /user@pass
Parse results for //user@pass
Parse results for /some/path?key=value
Parse results for @@
Parse results for ///
Urls parsed by method
|
Using a non standard protocol in Parse results for /
Parse results for /some/path
Parse results for //some/path
Parse results for /user@pass
Parse results for //user@pass
Parse results for /some/path?key=value
Parse results for @@
Parse results for ///
Urls parsed by method
|
The batches show that there are edge cases where
|
@watson since this a breaking change I would like to have your feedback on this. As a summary This means if we replace |
Replace the legacy
parse
API fromurl
module and use theURL
constructor instead. Reasons for this change are://some/path
are not parsed correctly//user@foo
have the same problemThe solution proposed in this PR considers the raw URL as partial like the ones seek on the headers and adding a fake protocol if necessary (to remove later). The function iterates over the URL instance properties to fill a result object since some properties of URL do not accept empty values like protocol.
Tests have been added in http/https requests to test these specific scenarios.