Skip to content

Comments

Implement Bytes::unsplit and Bytes::try_unsplit#819

Open
thanhminhmr wants to merge 4 commits intotokio-rs:masterfrom
thanhminhmr:bytes-unsplit
Open

Implement Bytes::unsplit and Bytes::try_unsplit#819
thanhminhmr wants to merge 4 commits intotokio-rs:masterfrom
thanhminhmr:bytes-unsplit

Conversation

@thanhminhmr
Copy link
Contributor

@thanhminhmr thanhminhmr commented Jan 23, 2026

Closes #503, #287.

  • Implement Bytes::unsplit and Bytes::try_unsplit
  • Implement tests for Bytes::unsplit and Bytes::try_unsplit

@thanhminhmr
Copy link
Contributor Author

The try_unsplit is the same as BytesMut one. The unsplit is a bit different though. @Darksonn do you have any comment?

pub fn unsplit(mut self, other: Bytes) -> Self;

Absorbs a Bytes that was previously split off if they are contiguous; otherwise, appends its bytes to this Bytes and return a new one. If the two Bytes objects were previously contiguous (i.e., if other was created by calling split_off on this Bytes), then this is an O(1) operation that just decreases a reference count and sets a few indices.

Otherwise, this method tries to convert self into a BytesMut and extend it with other. If that also fails, this method allocates a new BytesMut and copies both Bytes objects into it.

use bytes::Bytes;

let mut buf = Bytes::from_static(b"aaabbbcccddd");

let split = buf.split_off(6);
assert_eq!(b"aaabbb", &buf[..]);
assert_eq!(b"cccddd", &split[..]);

let buf = buf.unsplit(split);
assert_eq!(b"aaabbbcccddd", &buf[..]);

@thanhminhmr thanhminhmr marked this pull request as ready for review January 27, 2026 02:56
@thanhminhmr thanhminhmr force-pushed the bytes-unsplit branch 2 times, most recently from 88311d7 to 9689bab Compare January 27, 2026 14:44
@thanhminhmr thanhminhmr requested a review from Darksonn January 28, 2026 04:16
@Darksonn
Copy link
Member

I'm sorry but this PR raises a lot of forwards compatibility concerns for me. I'm concerned that it may limit our ability to add other features in the future. See #287 for prior discussion.

@thanhminhmr
Copy link
Contributor Author

@Darksonn

I want to clarify the intended contract of Bytes::try_unsplit, because I think that addresses the forward-compatibility concerns.

The key point is that try_unsplit is explicitly a best-effort optimization. It makes no guarantee that adjacent Bytes can be merged, now or in the future. The only guarantees are:

  • Ok(()) ⇒ the merge was proven safe and zero-copy
  • Err(other) ⇒ no invariants were violated; the caller can fall back to copying

In particular:

  • The implementation is free to fail conservatively, even if unsplitting might be possible.
  • Some current or future Bytes backends (e.g. from_owner) may always return failure, and that would still fully comply with the API.
  • This does not require exposing or freezing any vtable details.

This means the API does not commit us to supporting unsplitting for all Bytes, nor does it constrain future representations. A future public vtable / unsafe trait Share could optionally improve success rates, but is not required for correctness.

Given that this issue has existed for years and the use-cases are real today, I think a deliberately pessimistic try_unsplit provides value without blocking future design choices.

@thanhminhmr
Copy link
Contributor Author

To clarify how the current PR aligns with that contract:

The implementation only attempts to merge when it can prove the two Bytes share the same backing data. Concretely, it compares the internal data pointer and only proceeds if they are equal; otherwise it conservatively fails.

For Bytes::from_static, I updated the (currently unused) data field to store the original address of the static slice. That address is preserved across splits, so Bytes originating from the same static allocation share the same data. This allows static Bytes to be safely merged when appropriate and fixes the issue you pointed out earlier. I verified this behavior with miri, and it reports no UB after the fix.

There is one edge case I have not handled yet: merging when self is empty and other is non-empty. The current logic will conservatively fail if their backing data differ. This could be optimized by dropping self and moving other into it without copying, but I’ve intentionally left it out for now to keep the initial implementation minimal and conservative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider an unsplit method for Bytes

3 participants