Description
- Version: v9.x+
- Platform: all
- Subsystem: url
We are on the track to slowly deprecate the non-standard url.parse()
(#12168 (comment)) in favor of the new WHATWG standard-based URL API. One use case that currently cannot be migrated over from url.parse()
is the handling of relative URLs.
Background
url.parse()
accepts incomplete, relative URLs by filling unavailable components of a URL with null
.
> url.parse('#hash')
Url {
protocol: null,
slashes: null,
auth: null,
host: null,
port: null,
hostname: null,
hash: '#hash',
search: null,
query: null,
pathname: null,
path: null,
href: '#hash' }
On the other hand, the URL
constructor guarantees that all URL objects are fully complete and valid URLs, which means that it throws an exception in case of relative URLs:
> new URL('#hash')
TypeError [ERR_INVALID_URL]: Invalid URL: #hash
at Object.onParseError (internal/url.js:92:17)
at parse (internal/url.js:101:11)
at new URL (internal/url.js:184:5)
at repl:1:1
WHATWG URL API does have the algorithms necessary to parse relative URLs, however, and that is activated if a base
argument is provided:
> new URL('#hash', 'http://complete-url/')
URL {
href: 'http://complete-url/#hash',
origin: 'http://complete-url',
protocol: 'http:',
username: '',
password: '',
host: 'complete-url',
hostname: 'complete-url',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: '#hash' }
It is not always the case that a base URL is available, though.
Possible solutions
Do nothing
What this entails is that the currently supported ability to parse relative URLs will die as url.parse()
becomes deprecated.
Do not deprecate url.parse()
; otherwise do nothing
This is the most obvious actual solution, but from tickets like #12168, I don't see this as a good idea.
Add a non-standard TolerantURL
class
This could work if we trick the parser into believing we have a legitimate URL, except there are many conditionals in the URL parser algorithm that provide ad-hoc compatibility fixes with legacy implementations. We would have to make a set of opinionated assumptions about the nature of the URL, such as the URL's scheme.
In addition to parsing, the setters will have awkward semantics. Consider the following:
// Case 1
const relativeURL = new TolerantURL('#hash');
console.log(relativeURL.href);
// Prints #hash
relativeURL.protocol = 'http:';
console.log(relativeURL.href);
// Should this print http:#hash (what url.format() does)?
// http://#hash?
// Or make the setter a noop and therefore just #hash?
// Case 2
// Assuming we have decided to use http scheme semantics
// for TolerantURL if one is not supplied.
// The URL parser does not allow changing special-ness of
// scheme through the protocol setter.
const relativeURL = new TolerantURL('//username:password@host/');
relativeURL.protocol = 'abc';
console.log(relativeURL.href);
// Should this print abc://username:password@host/?
// Or //username:password@host/?
// Compare:
const absoluteURL = new URL('http://username:password@host/');
absoluteURL.protocol = 'abc';
console.log(absoluteURL.href);
// Prints http://username:password@host/