Skip to content

pathlib: Path.iterdir() is surprisingly not streaming #136059

Open
@nh2

Description

@nh2

Bug report

Suprisingly (contrary to its name and being a generator), Path.iterdir() does not stream directory entries:

It reads all directory entries into memory before yielding the first entry.

This can cause excessive memory usage when "iterating" over very large directories.

Users would expect by default that .iterdir() operates in a streaming way, like UNIX find or readdir(), streaming e.g. the results of the underlying system calls such as getdents64() on Linux.

But it does entries = list(scandir_it) instead:

def iterdir(self):
"""Yield path objects of the directory contents.
The children are yielded in arbitrary order, and the
special entries '.' and '..' are not included.
"""
root_dir = str(self)
with os.scandir(root_dir) as scandir_it:
entries = list(scandir_it)

This should be documented, and can hopefully be fixed without too much breakage.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Related issues

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-pathlibtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions