Description
The docs for fs::read_dir()
and fs::ReadDir
need to clearly state that iteration order is implementation defined and may vary not just between OS's but between identical versions of a directory on different filesystems.
Currently the warning on fs::read_dir()
is uselessly vague:
Platform-specific behavior
This function currently corresponds to the
opendir
function on Unix and theFindFirstFile
function on Windows. Note that, this may change in the future.
Meanwhile even this warning is completely missing from ReadDir
.
Finding the semantics on ordering requires going through the docs for the target platform:
opendir()
does not specify order but links toreaddir()
which states:
The order in which filenames are read by successive calls to
readdir()
depends on the filesystem implementation; it is unlikely
that the names will be sorted in any fashion.
FindFirstFileA
does not specify order but links toFindNextFile
which states:
The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.
The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.
In both cases the relevant information is two links deep which isn't really acceptable for stdlib documentation.
I want to note that I just spent 20 minutes trying to figure this out from a unit test that was dependent on this ordering and was passing locally but failing on our CI server even though both machines are running Linux x64. The problem was I'm running btrfs (which apparently returns files in lexicographical order, or just happened to return them that way) while I'm not entirely sure what filesystem the CI server is using (build jobs are running inside Docker anyway). Either way it turns out they iterated identical copies of the same directory in different orders. Frustratingly, a very similar test also depending on the ordering of read_dir()
but for a different directory was passing on both machines, which lead me to initially discount that it might have been an issue of iteration order.
This issue has been assigned to @ali-raheem via this comment.