Skip to content

_get_files doesn't return files in a deterministic order across OSes #239

Open
@dcato98

Description

_get_files in local.data.transforms.py doesn't return files in a deterministic order across OSes.

This is an issue when getting files, then splitting using a fixed seed. For example, in 08_pets_tutorial.ipynb (I added the seed parameter):

items = get_image_files(source)
split_idx = RandomSplitter(seed=42)(items)

In this case, 2 users on different OSes would have the same split_idx, but different train/validation sets.

It would be straightforward for a user to correct this by sorting items before passing this list into the splitter, but I wouldn't expect that many people would know to do this.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions