Skip to content

url: erroneous WHATWG whitespace handling #12825

@TimothyGu

Description

@TimothyGu
  • Version: v7.x and master
  • Platform: all
  • Subsystem: url
  1. We should be stripping leading and trailing C0 control or space if the call comes from any place other than the setters (step 1.3 in basic URL parser), but we are not.

    const assert = require('assert');
    const { URL } = require('url');
    assert.strictEqual(new URL('\x1fhttp://abc\x1f').href, 'http://abc/');
      // Currently: TypeError: Invalid URL: ...
  2. We should be stripping ASCII tab or newline (step 3 in basic URL parser) before entering the state machine (step 11). Instead of following the spec strictly, we are currently using a "clever" scheme that allows us to strip them without an additional loop. However, this scheme is already somewhat not elegant, but can actually break completely when the remaining magical variable is used:

    const assert = require('assert');
    const { URL } = require('url');
    assert.strictEqual(new URL('C|/', 'file://host/dir/file').href, 'file:///C:/');
      // No errors
    assert.strictEqual(new URL('C|\n/', 'file://host/dir/file').href, 'file:///C:/');
      // AssertionError: 'file://host/dir/C|/' === 'file:///C:/'

When implementing issue 1 above, one should first add an appropriate CHAR_TEST for C0 control or space, like so:

// https://infra.spec.whatwg.org/#c0-control-or-space
CHAR_TEST(8, IsC0ControlOrSpace, (ch >= '\0' && ch <= ' '))

Then we should add a new has_url argument to URL::Parse() much like the existing has_base to signify whether we should be stripping leading and trailing C0 control or space, or not.

As an optimization, if !has_url (i.e. if we are stripping leading and trailing C0 control or space) it should be possible to first find the appropriate start and end of the input without creating a new string, and then only create a new string in the ASCII tab or newline-removal step later.

@watilde Are you interested in taking a stab at this? #12846

Metadata

Metadata

Assignees

No one assigned

    Labels

    whatwg-urlIssues and PRs related to the WHATWG URL implementation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions