Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong paths from nt._getfinalpathname with DOS device selectors in the CWD #112563

Open
CarlitoGil opened this issue Nov 30, 2023 · 5 comments
Open
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@CarlitoGil
Copy link

CarlitoGil commented Nov 30, 2023

Bug report

Bug description:

In Windows if the CWD has a device selector \\.\ or \\?\ any function dependent on nt._getfullpathname may change the meaning of the path.

import os
import ntpath
os.chdir('\\\\?\\C:\\') # any path with a device selector
print(ntpath.abspath('\\')) # NOTE: single slash character

That code will print \\
and I mean 2 slashes, not an escaped one.

So if you use:

print(ntpath.abspath('\\Some\\Path')) # single slash before each component

The result is \\Some\Path.
That is an UNC network share, not a file named Path a couple of levels down from the root of the volume.
Perhaps there are security concerns there.

And if the CWD is a volume root, or drive device root, then:

os.chdir('\\\\.\\C:\\') # or \\?\Volume{GUID}\
print(ntpath.abspath('.')) # current directory

That will show \\.\C:
which is a device block, which means os.scandir would raise an error with that path.

CPython versions tested on:

3.12

Operating systems tested on:

Windows

@CarlitoGil CarlitoGil added the type-bug An unexpected behavior, bug, or error label Nov 30, 2023
@finnagin
Copy link
Contributor

Looking at the definition of _getfullpathname it doesn't look like python is doing much outside of calling GetFullPathNameW so I'm wondering if this may be an issue with the Windows api's handling of the prefixes being used here. Something to note is that in the Windows api namespace documentation under the win32 file namespace section it states that some apis do not support using "\\?\" and under the win32 device namespace section it states that most apis will not support using "\\.\". However, looking at the GetFullPathNameW documentation it looks as though it does at least support using "\\?\". I wonder if the problem may be coming from changing the current directory, as from looking at the SetCurrentDirectory documentation it does not appear to support "\\.\" or "\\?\" (at least from my read of it) and that may be causing the unexpected behavior we are seeing here.

It does look as though os.getcwd() still returns the correct path to the current directory so one potential solution for this specific case (now that #113829 has been merged) could be to call ntpath._abspath_fallback() (as this is just calling join(getcwd(),path)) instead of _getfullpathname as this will return the correct results for these cases. i.e.

>>> os.chdir('\\\\?\\C:\\')
>>> print(ntpath._abspath_fallback('\\'))
\\?\C:\
>>> print(ntpath._abspath_fallback('\\Some\\Path'))
\\?\C:\Some\Path
>>> os.chdir('\\\\.\\C:\\')
>>> print(ntpath._abspath_fallback('.'))
\\.\C:\

However, just blanketly always running ntpath._abspath_fallback() does seem to change some additional functionality. If I change abspath to always run _abspath_fallback instead of trying _getfullpathname first then I see the following tests fail:

======================================================================
ERROR: test_abspath (test.test_ntpath.TestNtpath.test_abspath)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cpython\Lib\test\test_ntpath.py", line 800, in test_abspath
    tester('ntpath.abspath("\\\\?\\C:////spam////eggs. . .")', "\\\\?\\C:\\spam\\eggs")
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cpython\Lib\test\test_ntpath.py", line 60, in tester
    raise TestFailed("%s should return: %s but returned: %s" \
          %(str(fn), str(wantResult), str(gotResult)))
test.support.TestFailed: ntpath.abspath("\\\\?\\C:////spam////eggs. . .") should return: \\?\C:\spam\eggs but returned: \\?\C:\spam\eggs. . .
FAIL: test_absolute (test.test_pathlib.test_pathlib.WindowsPathTest.test_absolute)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cpython\Lib\test\test_pathlib\test_pathlib.py", line 2086, in test_absolute
    self.assertEqual(str(P(other_drive).absolute()), other_cwd)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'Z:' != 'Z:\\dirA'
- Z:
+ Z:\dirA
FAIL: test_stat_inaccessible_file (test.test_os.Win32NtTests.test_stat_inaccessible_file)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cpython\Lib\test\test_os.py", line 3134, in test_stat_inaccessible_file
    self.assertEqual(0, stat2.st_dev)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
AssertionError: 0 != 16743285523683558314

Lastly, _abspath_fallback seems to be slower than abspath. To test the difference for a few different path lengths I ran the following:

import timeit

paths = ["foo" + (x>0)*10**(x)*"/o" for x in range(5)]
results = []
for path in paths:
    a = timeit.timeit(f"import ntpath;ntpath.abspath('{path}')", number=10000)
    b = timeit.timeit(f"import ntpath;ntpath._abspath_fallback('{path}')", number=10000)
    results.append((a,b,b/a))

I get these results:

subdirs 0 10 100 1000 10000
path length 3 23 203 2003 20003
abspath 0.051 0.055 0.086 0.377 3.158
fallback 0.205 0.217 0.251 0.523 3.076
multiples slower (fallback/abspath) 3.980 3.910 2.922 1.387 0.973

It does look as though _abspath_fallback is slower with smaller path lengths/fewer subdirectories but faster with a longer path length/more subdirectories. Though it looks as though the crossover point is slightly less than a path length of 20000 which is much larger than the default MAX_PATH limit of the windows api but could potentially be relevant if using the expanded MAX_PATH limit of 32,767 characters. Even in that case though it's pretty close to the performance of the current abspath implementation.

@CarlitoGil
Copy link
Author

Hi, @finnagin

It does look as though os.getcwd() still returns the correct path to the current directory

That is correct, but I should point out that other os module functions behave like GetFullPathNameW.

import os

os.chdir('C:\\')
os.stat('.') # works
os.stat('\\') # works

os.chdir('\\\\?\\C:\\')
os.stat('.') # works
os.stat('\\') # fails; cannot determine the correct root with a device selector in the CWD

The thing with the UNC \\?\ and \\.\ schemes is that only the selector itself is immune to . and .. components.

import nt
print(nt._getfullpathname('C:\\..')) # "C:\"  <-- root remains
print(nt._getfullpathname('\\\\SERVER\\SHARE\\..')) # "\\SERVER\SHARE"  <-- second component remains
print(nt._getfullpathname('\\\\?\\DEVICE\\..')) # "\\?\"  <-- device removed

Used to be that you could not use file or device selectors in the CWD, and still can't on the Command Prompt and other apps, but PowerShell and Python do allow it, which is nice for unmounted volumes.

Something to note is that in the Windows api namespace documentation under the win32 file namespace section it states that some apis do not support using "\\?\"

I wonder if the problem may be coming from changing the current directory, as from looking at the SetCurrentDirectory documentation it does not appear to support "\\.\" or "\\?\" (at least from my read of it) and that may be causing the unexpected behavior we are seeing here.

Since CPython does support the file and device selectors in the CWD, it seems that in Windows built-in functions like os.stat
the single-leading-slash (root-relative) paths are just kinda broken.

import os
os.chdir(os.listvolumes()[0]) # first volume with GUID
os.scandir('\\Some\\Path') # fails; correct root not found, but tries to connect to a network share

For argument's sake, if you were to deal with it at the C level, it seems to me that something would need to be sacrificed, like unmounted volumes. You could just use GetFinalPathNameByHandle to format as DOS path when changing the CWD by removing/replacing the selector. Of course, that would bring another set of problems, but DOS path arguments would behave as documented, whereas the Win32 path documentation does have caveats. That would deal with (ugh) the root of the problem.

That would also be more consistent with POSIX; because, I think, it resolves the symlinks when setting the CWD.

...And there's more:

import os
import ntpath

os.chdir('C:\\')
ntpath.join(ntpath.abspath('.'), '..') # "C:\.."  <-- this is OK

os.chdir('\\\\?\\C:\\')
ntpath.join(ntpath.abspath('.'), '..') # "\\?\C:.."  <-- this is very wrong

os.chdir('\\\\?\\C:\\Path')
ntpath.join(ntpath.abspath('.'), 'BAD_FILENAME.') # "\\?\C:\Path\BAD_FILENAME."  <-- this is hell on Earth if you use this path to create a file, because the dot will not be trimmed and Windows Explorer is going to show the file to you but then when you try to open or delete the file you'll get an error saying that it ain't know wacha talking 'bout because there's no file there even though you're seeing the file with your own two eyes... so...
# relative segments must not indirectly inherit
# a file selector, which prevents normalization 

@zooba
Copy link
Member

zooba commented Jan 24, 2024

Since CPython does support the file and device selectors in the CWD

This isn't strictly true - CPython doesn't care about your CWD. It just passes the string through to the operating system.

It seems that the only sensible thing we can do here is to raise in chdir if a path with a prefix is passed. It would be preferable if the operating system would reject the path itself, which we would convert into an error, since the places where the path fails are due to the OS not handling the CWD from its own internal uses (we're not passing it in).

I don't particularly want to add new errors like this, as I think you should be able to set the CWD to weird values if you need to. But we don't have to guarantee that the OS can handle it.

It's worth noting that os.path.join(os.getcwd(), PATH) is a suitable alternative to os.path.abspath(PATH) if you need to handle this case, and you can use it anywhere you expect to pass a special path to an API that knows how to use it.

If we had an easy way to detect that the fast abspath implementation had joined incorrectly, we could automatically fall back to the slow join, but considering you've already put the OS into an unsupported state, I don't think we have to be responsible for making everything else work around that.

I'm inclined to close this issue, as I don't think there's anything we can do. If someone comes up with a good proposal to document the oddity I'd consider it, but we don't want to make Python's behaviour dependant on something this low level.

@CarlitoGil
Copy link
Author

Hi, @zooba

This isn't strictly true - CPython doesn't care about your CWD. It just passes the string through to the operating system.

I think I get what you mean, please forgive if the term "support" was incorrect, but this is how I look at it: the currently supported Windows platforms and the Win32 APIs used do seem to all accept such paths, and CPython already has os.listvolumes which specifically uses the file selector, and os.readlink may also use it for absolute paths, and perhaps more importantly, os.scandir accepts and returns them, which is great.

It seems that the only sensible thing we can do here is to raise in chdir if a path with a prefix is passed.

From the SetCurrentDirectory documentation:

The current directory is shared by all threads of the process: If one thread changes the current directory, it affects all threads in the process

Just curious. Would that affect os.getcwd anyway?

I don't particularly want to add new errors like this, as I think you should be able to set the CWD to weird values if you need to.

Please keep the ability to use \\?\ in current working directories, or at least, try to resolve the DOS path before raising errors.

@zooba
Copy link
Member

zooba commented Jan 25, 2024

Please keep the ability to use \\?\ in current working directories, or at least, try to resolve the DOS path before raising errors.

We don't offer this ability - Windows does. You'll need to ask Microsoft to support it. All we offer is access to the OS APIs, and the OS APIs are the ones that are failing to handle the path correctly.

Just curious. Would that affect os.getcwd anyway?

Yes, 100%. getcwd simply calls GetCurrentDirectoryW, and chdir simply calls SetCurrentDirectoryW. CPython doesn't do any handling of it whatsoever, it just passes straight through to the OS.

Python already has os.listvolumes which specifically uses the file selector, and os.readlink may also use it for absolute paths, and perhaps more importantly, os.scandir accepts and returns them, which is great.

As I mentioned, any OS API that supports the path will work with the path, and our join() function works with them. The API that seems to not work correctly is SetCurrentDirectoryW, which considering that its documentation does not say it supports them unlike the other APIs that do say it, would seem to be by (Microsoft's) design.1 There seems to be other APIs that handle device paths when provided as absolute paths, but not when inferred from the current working directory, which also suggests that SetCurrentDirectoryW is at fault.

For your chdir(listvolumes()[0]) example, you ought to be using listdrives (or possibly listmounts(listvolumes()[0])[0]). Clarifying the documentation on this point would be fine if that's helpful (e.g. "a volume identifier is not necessarily a file system path and may not work in all the places a regular path would - consider using listdrives() instead").

Footnotes

  1. More likely it's just impossible to change for compatibility reasons. The "current directory" concept has been known to be a bad idea for decades, and the fact that it's still around just shows how hard it is to change existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants