Skip to content

Performance improvement for os.scandir on Windows #122885

Open
@Michael-K-Stein

Description

@Michael-K-Stein

Feature or enhancement

Proposal:

As has been mentioned in numerous issues (See #119169 for example) , there are quite a few performance issues regarding os.scandir and os.walk.


Diving into the implementation - see os_scandir_impl in posixmodule.c - I noticed that we are currently using WinAPI to list a directory. Looking into the relevant WinAPI functions (FindNextFileW, FindFirstFileW) it seems redundant to implement our Python wrapper around these wrappers. I propose to use the native NT functions - for example NtQueryDirectoryFile - directly, as we are already implementing a wrapper ourselves.


After quite a bit of reading MS-Docs and looking at Kernel32.dll & NtDll.dll, I have reached a stable implementation of this proposal. Unfortunately, I could not significantly reduce the amount of syscalls being performed, however I did reduce the amount of memory allocation and copying by a ratio of 1:6.
Additionally, this new implementation should aid in future implementations around Windows file system operations (See #99454 for any easy example).


This improves the performance of both os.scandir and os.walk which is implemented over it.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions