Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for os.scandir on Windows #122885

Open
Michael-K-Stein opened this issue Aug 10, 2024 · 1 comment
Open

Performance improvement for os.scandir on Windows #122885

Michael-K-Stein opened this issue Aug 10, 2024 · 1 comment
Labels
OS-windows performance Performance or resource usage type-feature A feature request or enhancement

Comments

@Michael-K-Stein
Copy link

Michael-K-Stein commented Aug 10, 2024

Feature or enhancement

Proposal:

As has been mentioned in numerous issues (See #119169 for example) , there are quite a few performance issues regarding os.scandir and os.walk.


Diving into the implementation - see os_scandir_impl in posixmodule.c - I noticed that we are currently using WinAPI to list a directory. Looking into the relevant WinAPI functions (FindNextFileW, FindFirstFileW) it seems redundant to implement our Python wrapper around these wrappers. I propose to use the native NT functions - for example NtQueryDirectoryFile - directly, as we are already implementing a wrapper ourselves.


After quite a bit of reading MS-Docs and looking at Kernel32.dll & NtDll.dll, I have reached a stable implementation of this proposal. Unfortunately, I could not significantly reduce the amount of syscalls being performed, however I did reduce the amount of memory allocation and copying by a ratio of 1:6.
Additionally, this new implementation should aid in future implementations around Windows file system operations (See #99454 for any easy example).


This improves the performance of both os.scandir and os.walk which is implemented over it.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@eryksun
Copy link
Contributor

eryksun commented Aug 10, 2024

Python has always limited itself to the Windows API. Back in the 1990s, Microsoft partially documented the user-mode NT API for use by third-party subsystems, and by services associated with drivers. They've documented more of the NT API over the past 30 years. However, it has never been intended for direct use by applications. That doesn't stop some developers, but just because Microsoft hasn't aggressively discouraged this practice, it's still not actually encouraged.

The Windows API has alternatives to FindFirstFileW() and FindNextFileW(). You can use GetFileInformationByHandleEx() with FileIdBothDirectory[Restart]Info, FileFullDirectory[Restart]Info, or FileIdExtdDirectory[Restart]Info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants