Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend RFC 517: Add material on std::fs #739

Merged
merged 2 commits into from
Feb 5, 2015
Merged

Conversation

aturon
Copy link
Member

@aturon aturon commented Jan 26, 2015

The IO reform RFC is being split into several semi-independent pieces, posted as PRs like this one.

This RFC amendment adds the file system section.

Rendered

@aturon
Copy link
Member Author

aturon commented Jan 26, 2015

A general comment: some of the functionality currently in std::io::fs will move to platform-specific std::os locations in this RFC.

These are not permanent changes, nor or they intended to be stabilized immediately. In many cases, the movement just signifies that we do not have a cross-platform implementation yet.

Very little in std::os::platform is intended to be #[stable] at 1.0, so all of this functionality will be considered more carefully at a later date before moving to its final location.

In other words, this RFC stabilizes only the parts that land in std::fs, and those are parts where we have a clear cross-platform implementation ready for 1.0.

@aturon
Copy link
Member Author

aturon commented Jan 26, 2015

Note that with this revision, both fsync and datasync remain available (under the names flush and flush_data respectively).

The recursive make/remove directory APIs have not yet been renamed, though the proposed names are not ideal. I'd like to suggest make_path and remove_path (along side make_dir, remove_dir) but I worry that the distinction is too subtle.

@tshepang
Copy link
Member

By renamed, do you mean in your branch where you are making these changes?

@aturon
Copy link
Member Author

aturon commented Jan 26, 2015

By renamed, do you mean in your branch where you are making these changes?

Ah, no sorry. I meant, from the original version of the RFC.

@aturon
Copy link
Member Author

aturon commented Jan 26, 2015

There were some worries about the name file_attr and FIleAttr in the previous RFC thread. I wonder if file_meta and FileMeta would be a more neutral/clearer name for an API retrieving file metadata.

@tshepang
Copy link
Member

For reference, here's why attr isn't loved: #517 (comment). I agree... file_meta sounds better, though I prefer file_metadata.

#### Files
[Files]: #files

The `File` type will largely stay as it is today, except that it will
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when close fails, because you are using an async filesystem? (note this is different than caring about the server crashing). and please don't say fsync/flush each time because some people might want to run some rust programs from a script and then run 'sync' manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC is largely concerned with the API of std::fs and how it's affected by the implementation details. I believe that we'll always have a Drop implementation which calls close on the file descriptor, so in some sense this is an implementation detail. Do you have an idea in mind where this would change the API of File, however?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API http://doc.rust-lang.org/std/io/fs/struct.File.html doesn't tell me how I know close succeeded, so I view it as incomplete. I also think what functions panic is part of the public API that shouldn't change.

As I posted http://discuss.rust-lang.org/t/fs-file-should-panic-if-implicit-close-fails-and-no-panic-is-active/1349 , I think there should be an explicit .close() method that people can call, that returns an error, and otherwise drop should panic if implicit close() fails and there is not already a panic in progress. I think that makes the most code correct.

edit: and having an explicit .close() could change what errors are returned by other functions, e.g. read() could then panic or return an error ("file is closed").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to continue the discussion about explicit close over here.

@Screwtapello
Copy link

Are you sure you want hard- and sym-links in the cross-platform portion of the filesystem API? They're ubiquitous on POSIX, but (as I understand it) not terribly well supported on Windows, and then there's platforms that don't support them at all, or support links that are neither hard nor soft (like Mac OS/HFS+ "shortcuts"). It seems to me that should definitely be squirreled away in std::platform, along with things like mknod() and mkfifo().

The `open_mode` function will take an `OpenOptions` struct, which will
encompass today's `FileMode` and `FileAccess` and support a
builder-style API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One method I think we forgot to consider is truncate on files today. Right now the function matches Unix semantics where it basically acts as a set_len function. It will shorten the file or extend it (filling in with 0s) if necessary.

This function, however doesn't exist on Windows to my knowledge (cc @retep998, I may be missing something). Currently we're using SetEndOfFile which operates slightly different than the Unix variant. On Windows you must first seek to a position and then "truncate" the file, which is to say that Windows will simply set the length of the file on disk to the current pointer in the file.

Unfortunately, the current semantics may lead to some loss on windows. Our "emulation" of ftruncate on Windows involves seeking the file to the desired length, calling SetEndOfFile, and then seeking back to where the file used to be. The "seek back" operation, however, may fail, which would mean the operation actually completed successfully but we're forced to still return an error.

I think I would personally recommend matching Windows semantics rather than unix semantics in this case. We would drop the u64 argument to the function, call SetEndOfFile on Windows, and unix would be a combination of lseek (to learn the position) followed by ftruncate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! @retep998 has pointed out SetFileInformationByHandle to me which worked like a charm!

In that case I think we're fine :)

@retep998
Copy link
Member

@Screwtapello Windows supports hard and symbolic links just fine. Really the issue is not whether the OS supports them, but whether the filesystem supports them. In which case all we have to do is error if the filesystem doesn't support hard or symbolic links.

@alexcrichton
Copy link
Member

It may also be worth talking about EINTR and its relationship to the functions that we're going to bind on unix for this RFC. Right now all relevant functions that may return EINTR will loop on it, but we're making a decision to return EINTR for calls to read/write (see #576 for more info).

Should we return EINTR for all bound functions, or only for read/write? I would personally be in favor of handling EINTR in as many places as possible outside of read and write.

@kjpgit
Copy link

kjpgit commented Jan 26, 2015

my 2cents on EINTR. If the user goes out of their way to set up a signal handler for SIGINT, they probably don't want you to blindly loop on it when it happens. And if they really want to mask that signal temporarily (e.g. during a certain close() or other operation), they can also do that with global sigprocmask() state. (threading issues and unwinding / restore issues around that not withstanding).

@kjpgit
Copy link

kjpgit commented Jan 26, 2015

I'd also note that on bsd and linux, SA_RESTART can be used with sigaction(2), to give more control over syscall restarting. edit: and my gist is, the user can do this if they dont want the rust functions to fail, rather than have rust re-implement the wheel here

**Links**:

* `hard_link` (renamed from `link`). Take `AsPath` bound.
* `sym_link` (renamed from `symlink`). Take `AsPath` bound.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never see 'symlink' broken up into 'sym link', it's a full-blown portmanteau. Seems like this should be either 'symlink' or 'soft_link'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of soft_link

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value in adding an underscore the sym_link here? Even the wikipedia article on symbolic links notes the term symlink as an alias.

@alexcrichton
Copy link
Member

@Screwtapello

Are you sure you want hard- and sym-links in the cross-platform portion of the filesystem API?

As @retep998 mentioned, Windows does have good support for these functions. I believe that the symlink-creating function did not exist in XP which is certainly one of the pain points of exposing the API. We will likely just return an error for XP, however (as @retep998 mentioned).

and then there's platforms that don't support them at all, or support links that are neither hard nor soft (like Mac OS/HFS+ "shortcuts").

Could you elaborate a bit on this? Do you mean that the some filesystems that OSX uses do not support this? Or are you thinking that some distributions simply don't have the symlink function in libc?


@kjpgit I find your comments pretty convincing, it sounds like we should punt EINTR handling "upstairs"

@kjpgit
Copy link

kjpgit commented Jan 27, 2015

I take back what I said about EINTR handling, it's pretty hard (forgot about SIGCHLD, SIGWNCH, etc., and SA_RESTART special cases are a mile long). Also, Python appears to be going down the 'retry everything` route: https://www.python.org/dev/peps/pep-0475/ ("PEP 475 - Retry system calls failing with EINTR")

That does seem saner and makes things much easier to test and reason about.

edit: to be clear, python can get away with that because they have integrated the 'catch SIGINT' into the interpreter and raise an exception on that, because that's one thing people actually want to let change the control flow of their program (although it's still pretty hard / impossible to test that, I can say from experience, when literally every single line of code can throw an exception). for rust that might be harder, as panics aren't recoverable and I'm not sure if you register your own signal handlers by default (would be obviously bad for FFI). However, I think it's generally good practice to force people to 'opt in' to potentially unsafe or buggy or really hard to test behavior, and having to 'opt in' to use system calls that randomly fail due to EINTR (based on some totally random SIGCHLD caused by another library) might be more prudent, even if it irritates someone who wants a blocking read/write to be 'cancelable' WITH A CUSTOM SIGNAL HANDLER THAT KEEPS THE PROGRAM STILL RUNNING (Default ctrl-c behavior terminates process which will still work).

`perm` accessors. The various `os::platform` modules will offer
extension methods on this structure.

* `set_perm` (renamed from `chmod`). Take `AsPath` bound, and a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider calling this set_permissions. perm is not a commonly used abbreviation of permissions, while dir is for directory for instance.

@kjpgit
Copy link

kjpgit commented Jan 27, 2015

I just looked at my web server code and I have a bare .write() call on a socket (as well as .push), and no retry loop (it was in a try!). I've been writing sockets code for 15 years and I still wouldn't expect that to fail spuriously with EINTR due to some random library potentially having a signal handler.

So I think it's time to put 1970s-era signals interrupting 'blocking' system calls in the grave, and for people who want interruption either 1) use non blocking io, which is is far saner (you control where and what thread(s) get notified) 2) use explicit _noretry versions of "blocking" system calls.

@Screwtapello
Copy link

and then there's platforms that don't support them at all, or support links that are neither hard nor soft (like Mac OS/HFS+ "shortcuts").

Could you elaborate a bit on this? Do you mean that the some filesystems that OSX uses do not support this?

Modern filesystems can store lots of different kinds of things. Probably nothing is universal, but files and directories are nearly so. Hard-links and symlinks are widespread, followed by a long tail of less-common things like HFS+ "shortcuts", FIFOs, alternate data streams, and so forth.

A cross-platform API has to draw the line somewhere, and I was expecting Rust's cross-platform API to draw the line at files and directories. It's true that both POSIX and Windows support creating symlinks, but first-class support would also require FileKind to have a guaranteed is_symlink method (noting that a path might be a symlink and a file, or a symlink and a directory), and maybe FileAttr should also have a guaranteed hard_link_count method.

If it's really possible to implement solid support for hard-links and symlinks on all supported platforms, then fair enough, let's do it.

@kjpgit
Copy link

kjpgit commented Jan 28, 2015

What is the thread safety (and expensiveness there of) of File / BufferedWriter? Is it going to be crappy like glibc, where every fwrite() takes a mutex lock, and you have to use fwrite_unlocked() to get decent performance, and pinky swear it's not being used in multiple threads?

It would be great if the ownership semantics of rust mean you don't have to internally lock for no reason. But what about stdout/stderr handles? It looks like stdout() creates a private buffer for the existing fd (1,2) and flushes but doesn't close on drop. So... no internal mutex lock should ever be needed, right? Just clarifying.

If a close() method gets added to Writer, the stdout()/stderr() handles might have to intercept that and actually not close the underlying fd. Only when main function ends would stdout/stderr get closed? (you could close stdout first and if that fails, panic to stderr).

@kjpgit
Copy link

kjpgit commented Jan 28, 2015

fyi LineBufferedWriter.into_inner seems like a bad method WRT error handling. it calls flush but no error response?

Also, should stdout() return a fully buffered writer if there is no terminal, like c does?

* `remove_dir` (renamed from `rmdir`). Take `AsPath` bound.
* `remove_dir_all` (renamed from `rmdir_recursive`). Take
`AsPath` bound.
* `walk_dir`. Take `AsPath` bound. Yield an iterator over `IoResult<DirEntry>`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between read_dir and walk_dir as they both return iterators over DirEntrys?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walk is recursive, read is not, as they are today in std::old_io. This is standard terminology.

@alexcrichton
Copy link
Member

This RFC looks like it's fairly close to being accepted, so I've created a PR for its implementation: rust-lang/rust#21936.

As with the previous PRs, nothing is set in stone and I'll be sure to update with any final revisions to this RFC!

@alexcrichton
Copy link
Member

One part I'd like to bring up here as well is that the std::old_io::fs module currently provides very rich descriptions of errors which happen. Most errors have extra contextual information saying, for example, which path could not be removed if the operation failed.

It's unclear to me to what extent we should be providing this contextual information.

@retep998
Copy link
Member

retep998 commented Feb 4, 2015

When creating/opening a file, there are actually 3 combinations that need to be represented:

  • Create the file and error if it already exists
  • Open the file and error if it does not exist
  • Create the file if it doesn't exist, open it otherwise

The current implementation proposed merely has a create method that takes a bool. This should either be expanded to two bools or turned into a ternary flag/enum of sorts.

@alexcrichton
Copy link
Member

Another possible option along that route would be:

/// If the `create` option is specified then this flag indicates whether
/// creation will fail if the file already exists.
fn exclusive(&mut self, exclusive: bool) -> &mut OpenOptions;

@aturon
Copy link
Member Author

aturon commented Feb 5, 2015

I've updated the RFC to match the implementation; the changes are pretty minor renamings that have already been proposed on the thread above.

@alexcrichton
Copy link
Member

This discussion on this RFC seems to have slowed down quite a bit and it seems like the overall design is well agreed-upon. There's still a number of points to bikeshed on in a few aspects such as various naming here and there, but this RFC has reached a point where it's ready to merge.

There's also a number of extensions to this API which can happen over time which should all be backwards compatible within the framework created here. I'd encourage lingering comments to be placed on the current PR (rust-lang/rust#21936) as well.

Thanks again for the comments everyone!

@alexcrichton alexcrichton merged commit e21c372 into rust-lang:master Feb 5, 2015
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 10, 2015
This commit is an implementation of [RFC 739][rfc] which adds a new `std::fs`
module to the standard library. This module provides much of the same
functionality as `std::old_io::fs` but it has many tweaked APIs as well as uses
the new `std::path` module.

[rfc]: rust-lang/rfcs#739
bors added a commit to rust-lang/rust that referenced this pull request Feb 10, 2015
This commit is an implementation of [RFC 739][rfc] which adds a new `std::fs`
module to the standard library. This module provides much of the same
functionality as `std::old_io::fs` but it has many tweaked APIs as well as uses
the new `std::path` module.

[rfc]: rust-lang/rfcs#739
@Centril Centril added the A-file Proposals relating to file systems. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-file Proposals relating to file systems.
Projects
None yet
Development

Successfully merging this pull request may close these issues.