You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As has been mentioned in numerous issues (See #119169 for example) , there are quite a few performance issues regarding os.scandir and os.walk.
Diving into the implementation - see os_scandir_impl in posixmodule.c - I noticed that we are currently using WinAPI to list a directory. Looking into the relevant WinAPI functions (FindNextFileW, FindFirstFileW) it seems redundant to implement our Python wrapper around these wrappers. I propose to use the native NT functions - for example NtQueryDirectoryFile - directly, as we are already implementing a wrapper ourselves.
After quite a bit of reading MS-Docs and looking at Kernel32.dll & NtDll.dll, I have reached a stable implementation of this proposal. Unfortunately, I could not significantly reduce the amount of syscalls being performed, however I did reduce the amount of memory allocation and copying by a ratio of 1:6.
Additionally, this new implementation should aid in future implementations around Windows file system operations (See #99454 for any easy example).
This improves the performance of both os.scandir and os.walk which is implemented over it.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Python has always limited itself to the Windows API. Back in the 1990s, Microsoft partially documented the user-mode NT API for use by third-party subsystems, and by services associated with drivers. They've documented more of the NT API over the past 30 years. However, it has never been intended for direct use by applications. That doesn't stop some developers, but just because Microsoft hasn't aggressively discouraged this practice, it's still not actually encouraged.
The Windows API has alternatives to FindFirstFileW() and FindNextFileW(). You can use GetFileInformationByHandleEx() with FileIdBothDirectory[Restart]Info, FileFullDirectory[Restart]Info, or FileIdExtdDirectory[Restart]Info.
Feature or enhancement
Proposal:
As has been mentioned in numerous issues (See #119169 for example) , there are quite a few performance issues regarding os.scandir and os.walk.
Diving into the implementation - see
os_scandir_impl
in posixmodule.c - I noticed that we are currently using WinAPI to list a directory. Looking into the relevant WinAPI functions (FindNextFileW
,FindFirstFileW
) it seems redundant to implement our Python wrapper around these wrappers. I propose to use the native NT functions - for exampleNtQueryDirectoryFile
- directly, as we are already implementing a wrapper ourselves.After quite a bit of reading MS-Docs and looking at Kernel32.dll & NtDll.dll, I have reached a stable implementation of this proposal. Unfortunately, I could not significantly reduce the amount of syscalls being performed, however I did reduce the amount of memory allocation and copying by a ratio of 1:6.
Additionally, this new implementation should aid in future implementations around Windows file system operations (See #99454 for any easy example).
This improves the performance of both
os.scandir
andos.walk
which is implemented over it.Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: