Skip to content

fs::read_dir() and ReadDir need clear warnings about not relying on iteration order #63183

Closed
@abonander

Description

@abonander

The docs for fs::read_dir() and fs::ReadDir need to clearly state that iteration order is implementation defined and may vary not just between OS's but between identical versions of a directory on different filesystems.

Currently the warning on fs::read_dir() is uselessly vague:

Platform-specific behavior

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Note that, this may change in the future.

Meanwhile even this warning is completely missing from ReadDir.

Finding the semantics on ordering requires going through the docs for the target platform:

  • opendir() does not specify order but links to readdir() which states:

The order in which filenames are read by successive calls to
readdir() depends on the filesystem implementation; it is unlikely
that the names will be sorted in any fashion.

  • FindFirstFileA does not specify order but links to FindNextFile which states:

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.

The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.

In both cases the relevant information is two links deep which isn't really acceptable for stdlib documentation.

I want to note that I just spent 20 minutes trying to figure this out from a unit test that was dependent on this ordering and was passing locally but failing on our CI server even though both machines are running Linux x64. The problem was I'm running btrfs (which apparently returns files in lexicographical order, or just happened to return them that way) while I'm not entirely sure what filesystem the CI server is using (build jobs are running inside Docker anyway). Either way it turns out they iterated identical copies of the same directory in different orders. Frustratingly, a very similar test also depending on the ordering of read_dir() but for a different directory was passing on both machines, which lead me to initially discount that it might have been an issue of iteration order.

This issue has been assigned to @ali-raheem via this comment.

Metadata

Metadata

Assignees

Labels

A-docsArea: Documentation for any part of the project, including the compiler, standard library, and toolsC-enhancementCategory: An issue proposing an enhancement or a PR with one.E-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions