Skip to content

Add DirEntry-based readdir methods #55358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

IanButterworth
Copy link
Member

Closes #55333

Renames and publicizes Base.Filesystem._readdirx as readdir(DirEntry, path).

Introducing the pattern of a type first arg allows further extension like readdir(S3Entry, path::S3Path) (where readdir(path::S3Path)::Vector{String} already exists).

@IanButterworth IanButterworth added the filesystem Underlying file system and functions that use it label Aug 3, 2024
@fredrikekre
Copy link
Member

Perhaps an alternative could be to implement this as readdir(::DirEntry) instead? Then you don't have to unwrap the string if you want to again call readdir on the results?

@IanButterworth
Copy link
Member Author

So

for e1 in readdir(DirEntry(path)) # not guaranteed that `path` is a dir but that's ok
    for e2 in readdir(e1)
        ...

That seems good, but as an additional method? I can see merit in both.
DirEntry would also need a constructor that takes a single path.

@LilithHafner LilithHafner added the design Design of APIs or of the language itself label Aug 3, 2024
@LilithHafner

This comment was marked as resolved.

@LilithHafner
Copy link
Member

LilithHafner commented Aug 3, 2024

DirEntry would also need a constructor that takes a single path.

Would that constructor involve a filesystem call? If so, readdir(DirEntry(path)) would be slower than readdir(DirEntry, path). If not, then DirEntry(path) would not be a full DirEntry.

Input type and output type are not necessarily coupled here: I might want the file names contained in a directiry pointed to by a DirEntry or I might want the DirEntries at a path which I only have a string representation of. Consequently, I do not like the proposed readdir(path::DirEntry) -> Vector{DirEntry} alternative.

The API in this PR looked good to me as of e4d5be0.

@IanButterworth

This comment was marked as resolved.

@IanButterworth

This comment was marked as resolved.

@IanButterworth

This comment was marked as resolved.

@IanButterworth
Copy link
Member Author

@LilithHafner what do you think of the way I implemented it? Note the DirEntry constructor and docstring that specifies the unknown filetype when given a path.

@IanButterworth
Copy link
Member Author

IanButterworth commented Aug 3, 2024

One option is to remove the DirEntry(path::String) constructor so require the user to always enter through readdir(DirEntry, path::String) and just make DirEntry machine generated.

for e1 in readdir(DirEntry, "some/dir")
    isdir(e1) || continue
    for e2 in readdir(e1)
        ...

@LilithHafner
Copy link
Member

LilithHafner commented Aug 3, 2024

@LilithHafner what do you think of the way I implemented it?

I think there should only be one way, not both readdir(DirEntry, path) and readdir(DirEntry(path)).

I still prefer readdir(DirEntry, path) because

  • The semantics of readdir(d::DirEntry) could be _readdirx(d.dir) (will always succeed if d is a valid DirEntry) or _readdirx(d.path) (possibly more useful, but could fail if d is not a directory).
  • The input type and output type are not necessarily coupled here: I might want the file names contained in a directory pointed to by a DirEntry or I might want the DirEntrys at a path which I only have a string representation of. This is especially true with generalizations like S3Path and S3Entry.

@IanButterworth
Copy link
Member Author

Ok. I think that means you agree with #55358 (comment) ?

@IanButterworth IanButterworth changed the title Add readdir(DirEntry, path) Add DirEntry-based readdir methods Aug 5, 2024
Co-Authored-By: Ian Butterworth <i.r.butterworth@gmail.com>
Co-Authored-By: Simeon David Schaub <simeon@schaub.rocks>
@IanButterworth
Copy link
Member Author

I think this has reached a happy design? @LilithHafner would you mind a final review?

@LilithHafner
Copy link
Member

I still like the PR as of e4d5be0 when it had the API that @oscardssmith approved. I don't like it as is for the reasons I stated in #55358 (comment). Open to reconsideration, but I don't think any of the points in that comment have been addressed.

@IanButterworth
Copy link
Member Author

IanButterworth commented Aug 6, 2024

This currently allows

for e1 in readdir(DirEntry, "some/dir")
    isdir(e1) || continue
    for e2 in readdir(e1)
        ...

As of e4d5be0 it required extracting the path from the entry.

for e1 in readdir(DirEntry, "some/dir")
    isdir(e1) || continue
    for e2 in readdir(DirEntry, e1.path)
        ...

I see #55393 has been opened. (Note that this PR no longer has DirEntry(path::AbstractString).)

So this PR treats DirEntry as only machine generated, and introduces readdir(dir::DirEntry) as convenience.
To me that seems like a good design?

@LilithHafner
Copy link
Member

I opened #55393 to clarify the API I prefer because the "view reviewed chagnes" button on github seems to be reporting incorrect diffs in the presence of force pushes and plain git won't reveal that commit b.c. it's been force pushed over.

@LilithHafner
Copy link
Member

The methods

readdir(::AbstractString) -> Vector{String}
readdir(DirEntry, ::AbstractString) -> Vector{DirEntry}
readdir(::DirEntry) -> Vector{DirEntry}

are incomplete without

readdir(String, ::DirEntry) -> Vector{String}

@topolarity
Copy link
Member

If we support it, I think readdir(path) should return the same thing (Vector{String}) regardless of whether path is a DirEntry.

I agree with @LilithHafner that it feels uncomfortable to constrain the abstraction/type to describe the input type to match the output type. It invites some awkward design questions:

  • How do I represent /? What if I didn't want to consider that case for my CustomDirEntry?
  • What if I embed query-related data in my DirEntry type? e.g. a session key or a query time.

Although it's more conservative now that Base doesn't provide the DirEntry constructor publically so those points are a bit moot, I think the shorthand invites downstream libs to do this

@LilithHafner
Copy link
Member

LilithHafner commented Aug 6, 2024

From chat with @IanButterworth:

We should also think about some things before publicizing DirEntry

  • Do we cache stat calls to unknown type DirEntries?
  • We should communicate to users that DirEntries might be stale/wrong if the filesystem has changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself filesystem Underlying file system and functions that use it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename and publicize Base.Filesystem._readdirx
6 participants