Description
Proposal
Problem statement
It would be useful to have a version of io::copy
that can use:
- splice or sendfile, some of the existing optimizations have to be rolled back in rust#108283
- nonblocking IO
Motivation, use-cases
Solution sketches
An API specific to file descriptors
pub fn os::unix::os_copy(buffer: &mut BorrowedBuf, source: &impl AsFd, sink: &impl AsFd) -> io::Result<u64>
It is less generic than io::copy
but makes it explicit it only operates on file-like types and may need an intermediate buffer to hold data across multiple invocations when doing non-blocking IO.
Unclear: Whether it should return an error when offload isn't possible or silently fallback to io::copy.
Downsides:
- covering windows would be more complicated since it distinguishes between Handle and Socket
- requires
cfg()
s - can't be used by code generic over Read/Write types, e.g. tar-rs
Lean on specialization
pub fn io::zero_copy(source: &mut impl Read, sink: &mut impl Write) -> io::Result<u64>
This is essentially the same as today's io::copy
does but with altered guarantees
- if the caller passes a BufWriter then any read-but-unwritten bytes will be held in the bufwriter when a
WouldBlock
occurs. Otherwise the bytes will be dropped - changes made to
source
afterzero_copy
returns may become visible insink
, as is the case when usingsendfile
orsplice
Downsides:
- API guarantees strongly rely on specialization
- the non-blocking case might be a footgun if someone tries to pass a BufWriter as &dyn Write where the specialization won't be able to see it and thus end up dropping bytes
Hybrid of the above
Make the buffer an explicit argument for non-blocking IO but use best-effort specialization for the offloading aspects.
Encapsulate the copy operation in a struct/builder
Rough sketch:
struct Copier<'a, R, W> {
// a bunch of enums
}
impl<'a, R, W> Copier<'a, R, W> where R: Read, W: Write {
/// On errors an internal buffer will be allocated if none is provided
fn buffer(&mut self, buf: &'a mut BorrowedBuf) {}
fn source(&mut self, src: R) {}
fn sink(&mut self, sink: W) {}
/// Runs until first error, can be resumed with a later call
/// Does not ignore wouldblock or interrupted errors
fn copy_chunk() -> Result<u64> { todo!() }
fn total() -> u64 { todo!() }
}
#[cfg(unix)] // impl for brevity, should be an extension trait.
impl<'a, R, W> Copier<'a, R, W> where R: Read + AsFd {
fn fd_source(&mut self, src: &'a R) {}
}
#[cfg(unix)]
impl<'a, R, W> Copier<'a, R, W> where W: Write + AsFd {
fn fd_sink(&mut self, sink: &'a W) {}
}
Under the hood it could still try to use specialization if the platform-specific APIs aren't used.
Any of the above, but N pairs instead of 1
When copying many small files and the like it can be beneficial to run them in batches. It's not a full-fledged async runtime that could add work incrementally as other work items complete but still more efficient than doing one-at-a-time.
Under the hood we could use polling or io_uring where appropriate.
Links and related work
- specialize io::copy to use copy_file_range, splice or sendfile rust#75272
- don't splice from files into pipes in io::copy rust#108283
- https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/io.3A.3Acopy.20race.20.23108283/near/346623546
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.