Skip to content

Support for non-blocking and best-effort zero-copy io::copy #202

Open
@the8472

Description

@the8472

Proposal

Problem statement

It would be useful to have a version of io::copy that can use:

  • splice or sendfile, some of the existing optimizations have to be rolled back in rust#108283
  • nonblocking IO

Motivation, use-cases

Solution sketches

An API specific to file descriptors

pub fn os::unix::os_copy(buffer: &mut BorrowedBuf, source: &impl AsFd, sink: &impl AsFd) -> io::Result<u64>

It is less generic than io::copy but makes it explicit it only operates on file-like types and may need an intermediate buffer to hold data across multiple invocations when doing non-blocking IO.

Unclear: Whether it should return an error when offload isn't possible or silently fallback to io::copy.

Downsides:

  • covering windows would be more complicated since it distinguishes between Handle and Socket
  • requires cfg()s
  • can't be used by code generic over Read/Write types, e.g. tar-rs

Lean on specialization

pub fn io::zero_copy(source: &mut impl Read, sink: &mut impl Write)  -> io::Result<u64>

This is essentially the same as today's io::copy does but with altered guarantees

  • if the caller passes a BufWriter then any read-but-unwritten bytes will be held in the bufwriter when a WouldBlock occurs. Otherwise the bytes will be dropped
  • changes made to source after zero_copy returns may become visible in sink, as is the case when using sendfile or splice

Downsides:

  • API guarantees strongly rely on specialization
  • the non-blocking case might be a footgun if someone tries to pass a BufWriter as &dyn Write where the specialization won't be able to see it and thus end up dropping bytes

Hybrid of the above

Make the buffer an explicit argument for non-blocking IO but use best-effort specialization for the offloading aspects.

Encapsulate the copy operation in a struct/builder

Rough sketch:

struct Copier<'a, R, W> {
   // a bunch of enums
}

impl<'a, R, W> Copier<'a, R, W> where R: Read, W: Write {
   /// On errors an internal buffer will be allocated if none is provided
   fn buffer(&mut self, buf: &'a mut BorrowedBuf) {}

   fn source(&mut self, src: R) {}
   fn sink(&mut self, sink: W) {}

   /// Runs until first error, can be resumed with a later call
   /// Does not ignore wouldblock or interrupted errors
   fn copy_chunk() -> Result<u64> { todo!() }

   fn total() -> u64 { todo!()  }
}


#[cfg(unix)] // impl for brevity, should be an extension trait. 
impl<'a, R, W> Copier<'a, R, W> where R: Read + AsFd {
   fn fd_source(&mut self, src: &'a R) {}
}

#[cfg(unix)]
impl<'a, R, W> Copier<'a, R, W> where W: Write + AsFd {
   fn fd_sink(&mut self, sink: &'a W) {}
}

Under the hood it could still try to use specialization if the platform-specific APIs aren't used.

Any of the above, but N pairs instead of 1

When copying many small files and the like it can be beneficial to run them in batches. It's not a full-fledged async runtime that could add work incrementally as other work items complete but still more efficient than doing one-at-a-time.

Under the hood we could use polling or io_uring where appropriate.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions