Skip to content

url.format() encodes '#' inconsistently #8064

Closed
@georgecrawford

Description

Version: affects at least Node 4.4.7, 5.7.1, 6.3.0
Platform: Darwin Kernel Version 15.5.0; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64
OS: Mac OS X 10.11.5

For a reason I can't understand, a # symbol in the search parameter of a URL is formatted inconsistently. The first occurrence (whatever the position) seems to be url-encoded, and subsequent occurrences are not:

$ node
> require('url').format({search:'foo=1#2#3#'})
'?foo=1%232#3#'
> require('url').format({search:'foo=###'})
'?foo=%23##'
> require('url').format({search:'###'})
'?%23##'

This seems to affect Node 4-6 at least, in my testing. It's causing problems in sindresorhus/normalize-url#26.

Activity

imyller

imyller commented on Aug 11, 2016

@imyller
Member

Yes, because at https://github.com/nodejs/node/blob/master/lib/url.js#L627

search = search.replace('#', '%23');

and String.prototype.replace() documentation states that:

Only the first occurrence will be replaced.
imyller

imyller commented on Aug 11, 2016

@imyller
Member

If all occurrences of # must be replaced, the line should be:

search = search.replace(/#/g, '%23');
imyller

imyller commented on Aug 11, 2016

@imyller
Member

I'd be happy to submit a PR if someone can confirm that all occurrences of # should be url-encoded in URL search part.

georgecrawford

georgecrawford commented on Aug 11, 2016

@georgecrawford
Author

Looks like this has already been noticed: https://github.com/nodejs/node/pull/2303/files#diff-bbc5176adff1bbc01acd61bfc85d049fR973. I'm not sure what the status of the various URL improvements is. There seem to be some merges, reverts, and open PRs with conflicts.

Any chance of getting a quick fix in for this?

georgecrawford

georgecrawford commented on Aug 11, 2016

@georgecrawford
Author

Right - I also don't know what's desired. I just see that a number of different PRs to refactor this code seem to add the /g fix.

imyller

imyller commented on Aug 11, 2016

@imyller
Member

Yes, hash-URIs are tricky.

Anything After the First # is a considered Fragment Identifier, and it is not immediately clear what should be done with the #'s inside Fragment Identifier.

RFC3986 states that:

A fragment identifier component is indicated by the presence of a
number sign ("#") character and terminated by the end of the URI.

And formatting of fragment is context dependent:

   The fragment's format and resolution is therefore
   dependent on the media type [RFC2046] of a potentially retrieved
   representation, even though such a retrieval is only performed if the
   URI is dereferenced.  If no such representation exists, then the
   semantics of the fragment are considered unknown and are effectively
   unconstrained. 

So, encoding only the first # might or might not be considered a bug.

georgecrawford

georgecrawford commented on Aug 11, 2016

@georgecrawford
Author

But I'm explicitly setting this string as the search property of the URL. So I'd expect this to be treated as a 'query string', and for all hashes (#) to be url-encoded so they're not mistaken for a Fragment identifier.

MylesBorins

MylesBorins commented on Aug 11, 2016

@MylesBorins
Contributor
added
urlIssues and PRs related to the legacy built-in url module.
on Aug 11, 2016
jasnell

jasnell commented on Aug 11, 2016

@jasnell
Member

Definitely a bug, in my opinion. I'll look at this a bit later today and see about getting a PR opened.

self-assigned this
on Aug 11, 2016
jasnell

jasnell commented on Aug 11, 2016

@jasnell
Member

(unless you want to go ahead with a PR @imyller :-) ..)

imyller

imyller commented on Aug 11, 2016

@imyller
Member

I'll open PR soon.

modified the milestone: 7.0.0 on Aug 11, 2016

12 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

urlIssues and PRs related to the legacy built-in url module.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    `url.format()` encodes '#' inconsistently · Issue #8064 · nodejs/node