Skip to content

non-Buf body chunks #3026

Open
Open
@scottlamb

Description

@scottlamb

Is your feature request related to a problem? Please describe.

I'd like hyper to be able to support more advanced/efficient IO operations that don't use an intermediary &[u8]:

  • ask the kernel to directly copy bytes from an input fd into an output fd (sendfile(2), splice(2), copy_file_range(2), IO_URING_OP_SPLICE, chains of io_uring read and write operations, etc.)
  • maybe a more safe and/or efficient use of mmap->write than the status quo (but see caveat below):
    • callers can implement the current API by providing a reference directly to the mmaped region, but this is unsound if the underlying bytes can change, and it has a latency problem. Particularly when the backing file is on spinning disk (and not in page cache), it's really bad to make an async IO thread wait for a major page fault (10+ ms). (I used to use this approach in a server where I know the underlying bytes don't change, but I stopped because there's no good way around the major page fault problem.)
    • callers can implement the current API by providing a reference to a copy done ahead of time on the thread of their choice (avoiding both those problems), but this is inefficient/unnecessary when the memory access is done from the kernel by write/writev/IO_URING_WRITEV. (My server does this now, but it'd be nice to avoid the copy.)
    • maybe better: passing a raw pointer and length to the syscall doesn't actually require copying or guaranteeing that nothing else is changing the backing file. caveat: I think the thread still stalls with a major page fault within the syscall, just as it would on userspace read. so maybe this approach isn't that great anyway, as compared to copying ahead of time or using the fd-to-fd APIs mentioned above.

(One might point out that a userspace TLS implementation needs the bytes in userspace anyway. But at least on Linux, kTLS is possible, and there's been recent movement on using it from Rust: rustls/rustls#198. And sometimes people don't use TLS or handle it in an external proxy server.)

My half-baked trait below might have another benefit of allow a caller-supplied IO to control memory allocation more by providing chunks from a pool or something rather than fresh allocations done within hyper.

In terms of hyper's vision, I think this aligns with Fast and Flexible.

This has been in the back of my mind for a while, but the hyper v1.0.0-rc.1 post yesterday made me realize the clock might be ticking. The vision doc says "If it is determined necessary to make breaking changes to the API, we'll save them for after the 3 years." I fear this needs a breaking API change, and I'd be sad to miss the boat if it'll be that long before the next one.

If you consider API changes to support this to be too complex to maintain or that they'd introduce too much work/delay to design/implement before reaching hyper 1.0, that's one thing, but I'd rather at least bring it up first!

Describe the solution you'd like

My thought is that hyper::body::Body::Data needs to provide:

  • the (remaining) length, so hyper's proto layers can frame things properly
  • a way for the IO to actually provide a chunk or consume the next bytes from one. IIUC, users are supposed to be able to bring their own transport, so this part is really a contract between two parts that can be provided by the user. Buf is one such contract, but it's not suitable when you want to defer the actual read because it promises that the bytes can be produced immediately (not async) and infallibly.

and then some users will use a chunk type that allows them to splice from one file descriptor straight to the output file descriptor, encrypting with kTLS. And an IO to match it.

The current IO bounds of I: AsyncRead + AsyncWrite + Unpin for the io passed to e.g. hyper::server::conn::http2::Builder::serve_connection can still be provided, but the advanced users will also want hyper to use some way of sending/receiving chunks without going through those traits. Half-baked sketch:

trait Data {
    /// Remaining length in bytes.
    fn len(&self) -> usize;
}

trait Io : AsyncRead + AsyncWrite + Unpin {
    type Data : Data;

    fn poll_write_data(...) -> Poll<Result<usize, Error>>;
    fn poll_read_data(...) -> Poll<Result<Self::Data, Error>>;
}

(For Chunk = Bytes, poll_write_chunk and poll_read_chunk should be implementable in terms of poll_write and poll_read, respectively.)

Lots I don't know:

There might be a lot to consider in those ... bytes above: hyper at least needs a way to tell it the max bytes it's expecting in the next chunk based on framing.

The proto layers need to plumb that through. Maybe this can be done without breaking compatibility with the existing API via default trait bounds and a "simple" methods that require Data: Buf and provide poll_write_chunk and poll_read_chunk in terms of poll_write and poll_read vs an "extended" methods that don't. Maybe it can't. I'm not sure!

I see the hyper::body::Incoming interface uses Data = Bytes. It'd be nice to support more flexibility on the incoming side as well as the outgoing side.

I haven't looked at how this would intersect with http3/UDP support.

Describe alternatives you've considered

  • Convince ourselves that it's possible to do this later without an API break.
  • Accept that there's performance wins out of reach with the current API.
  • Maybe somewhere between: current API can be extended for this for outgoing stuff but not for incoming stuff (hyper::body::Incoming)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-bodyArea: body streaming.B-rfcBlocked: More comments would be useful in determine next steps.C-featureCategory: feature. This is adding a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions