Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nturl2path.pathname2url() doesn't handle forward slashes #120423

Open
nineteendo opened this issue Jun 12, 2024 · 5 comments
Open

nturl2path.pathname2url() doesn't handle forward slashes #120423

nineteendo opened this issue Jun 12, 2024 · 5 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Jun 12, 2024

Bug report

Bug description:

>>> from nturl2path import pathname2url
>>> pathname2url('//unc/as/path')
'//unc/as/path' # NOK
>>> pathname2url(r'\\unc\as\path')
'////unc/as/path' # OK

CPython versions tested on:

3.13

Operating systems tested on:

Windows

@nineteendo nineteendo added the type-bug An unexpected behavior, bug, or error label Jun 12, 2024
@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 12, 2024

Quoting @eryksun:

The standard library doesn't use urllib.request.pathname2url(), so it isn't currently an internal issue. However, the documentation claims to support "the local syntax for a path", which would include using forward slash as the path separator. The Windows implementation of pathname2url() is tested in "Lib/test/test_urllib.py", but AFAICT only with backslash as the path separator. I think the input path should be normalized via normpath() and then split via splitroot() to process it sanely.

Also, currently nothing else in the standard library uses the nturl2path module, and its existence isn't documented. The two functions could be defined directly in urllib.request. The way some of the tests are implemented would have to be updated as well.

@eryksun
Copy link
Contributor

eryksun commented Jun 12, 2024

While using an empty authority (i.e. hostname or namespace) for a UNC path is valid according to RFC8089 (e.g. "file:////spam/eggs/foo"), the two-slash form with an authority is preferred on Windows (e.g. "file://spam/eggs/foo"). For DOS drive paths, the form with an empty authority is preferred (e.g. "file:///C:/eggs/foo"). But using an explicit authority is also supported (e.g. "file://localhost/C:/eggs/foo", or sometimes "file://./C:/eggs/foo").

@barneygale
Copy link
Contributor

barneygale commented Jun 12, 2024

On POSIX, urllib.request.pathname2url() simply calls quote(), so you might expect to use it like this:

>>> 'file://' + urllib.request.pathname2url('/etc/hosts')
'file:///etc/hosts'

But if we try the same thing using the Windows version we get unusual results:

>>> 'file://' + nturl2path.pathname2url(r'c:\foo')
'file://///C:/foo'
>>> 'file://' + nturl2path.pathname2url('c:/foo')
'file://///C://foo'
>>> 'file://' + nturl2path.pathname2url(r'\\server\share\file')
'file://////server/share/file'
>>> 'file://' + nturl2path.pathname2url('//server/share/file')
'file:////server/share/file'

I don't think there's a way to use it correctly. IMHO we should deprecate + remove it.

@barneygale
Copy link
Contributor

barneygale commented Jun 12, 2024

We can recommend Path.as_uri() to users as a replacement. It's been available since pathlib was added to the standard library.

It's tempting to also deprecate and remove url2pathname(), but:

  1. It's used a couple of times internally (unlike pathname2url()), and
  2. The potential replacement (Path.from_uri() was only added in 3.13, and if we want to be kind to package maintainers, I think we ought to wait until 3.12 drops out of support.

@pygeek
Copy link
Contributor

pygeek commented Jun 13, 2024

We can recommend Path.as_uri() to users as a replacement. It's been available since pathlib was added to the standard library.

It's tempting to also deprecate and remove url2pathname(), but:

  1. It's used a couple of times internally (unlike pathname2url()), and

  2. The potential replacement (Path.from_uri() was only added in 3.13, and if we want to be kind to package maintainers, I think we ought to wait until 3.12 drops out of support.

If we want to encourage usage of Path.from_uri over url2pathname wouldn't that justify a depreciation notice, at least a recommendation in the docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants