Closed
Description
Version
v18.16.1
Platform
Linux foo 6.2.0-25-generic #25-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 16 17:05:07 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
url
What steps will reproduce the bug?
> const { format } = require('node:url')
undefined
> format(new URL('tel:123'))
'tel:123'
> format(new URL('tel:123'), { unicode: true })
'tel://123'
> format(new URL('doi:10.123/456'))
'doi:10.123/456'
> format(new URL('doi:10.123/456'), { unicode: true })
'doi://10.123/456'
How often does it reproduce? Is there a required condition?
100% reproducible
What is the expected behavior? Why is that the expected behavior?
The //
characters should not be being inserted.
Three reasons why:
- obviously the
unicode
flag should not be changing the output of the URLs I show - RFC 3986 section 3 says: 'When authority is not present, the path cannot begin with two slash characters ("//")'
- RFC 3966 clearly shows that the
tel:
URL syntax (for example) never starts with//
What do you see instead?
URLs that are not in the slashedProtocol
set should not be having //
added to them.
Additional information
From manual inspection of the code, I am guessing that node/src/node_url.cc line 198 onwards is the issue:
if (unicode) {
out->host = ada::idna::to_unicode(out->get_hostname());
}
I guess that ada::idna::to_unicode
is transforming the hostname from a falsey value to an empty but truthy value and hence get_href
is adding in the //
. Perhaps the above lines should be checking that there actually is a hostname, before updating out->host
.