Open
Description
Bug report
Suprisingly (contrary to its name and being a generator), Path.iterdir()
does not stream directory entries:
It reads all directory entries into memory before yielding the first entry.
This can cause excessive memory usage when "iterating" over very large directories.
Users would expect by default that .iterdir()
operates in a streaming way, like UNIX find
or readdir()
, streaming e.g. the results of the underlying system calls such as getdents64()
on Linux.
But it does entries = list(scandir_it)
instead:
cpython/Lib/pathlib/__init__.py
Lines 835 to 843 in c419af9
This should be documented, and can hopefully be fixed without too much breakage.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux