-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
- Version: v7.x and master
- Platform: all
- Subsystem: url
-
We should be stripping leading and trailing C0 control or space if the call comes from any place other than the setters (step 1.3 in basic URL parser), but we are not.
const assert = require('assert'); const { URL } = require('url'); assert.strictEqual(new URL('\x1fhttp://abc\x1f').href, 'http://abc/'); // Currently: TypeError: Invalid URL: ...
-
We should be stripping ASCII tab or newline (step 3 in basic URL parser) before entering the state machine (step 11). Instead of following the spec strictly, we are currently using a "clever" scheme that allows us to strip them without an additional loop. However, this scheme is already somewhat not elegant, but can actually break completely when the remaining magical variable is used:
const assert = require('assert'); const { URL } = require('url'); assert.strictEqual(new URL('C|/', 'file://host/dir/file').href, 'file:///C:/'); // No errors assert.strictEqual(new URL('C|\n/', 'file://host/dir/file').href, 'file:///C:/'); // AssertionError: 'file://host/dir/C|/' === 'file:///C:/'
When implementing issue 1 above, one should first add an appropriate CHAR_TEST
for C0 control or space, like so:
// https://infra.spec.whatwg.org/#c0-control-or-space
CHAR_TEST(8, IsC0ControlOrSpace, (ch >= '\0' && ch <= ' '))
Then we should add a new has_url argument to URL::Parse() much like the existing has_base to signify whether we should be stripping leading and trailing C0 control or space, or not.
As an optimization, if !has_url (i.e. if we are stripping leading and trailing C0 control or space) it should be possible to first find the appropriate start and end of the input without creating a new string, and then only create a new string in the ASCII tab or newline-removal step later.
@watilde Are you interested in taking a stab at this? #12846