Skip to content

Uplift Utf8Bytes #783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

coolreader18
Copy link
Contributor

@coolreader18 coolreader18 commented Apr 12, 2025

This adds a Bytes wrapper with a UTF-8 validity invariant, uplifting the type found as:

  • bytes_utils::Str (2.4M downloads/month)
  • bytestring::ByteString (1.7M downloads/month)
  • tungstenite::Utf8Bytes
  • pub(crate) http::ByteStr
  • axum::extract::ws::Utf8Bytes (itself a wrapper around tungstenite::Utf8Bytes, so as to not depend on tungstenite in the public API)
  • pub(crate) h2::hpack::ByteStr
  • async_nats::Subject
  • and probably elsewhere.

My approach here was to just copy all the API surface from Bytes that could be adapted (as well as a function based on String::from_utf8()), but if this is too big to merge all at once, I could take out some of the methods and add them in a followup PR. I didn't directly copy anything from any of the aforementioned crates, in case that matters wrt licensing, though I did look a little bit at how bytestring phrased their docs.

The Hash impl delegates to str as Hash rather than Bytes as Hash, meaning it's not hash-compatible with Bytes. This differs from http::ByteStr and tungstenite::Utf8Bytes, but it's required, since Utf8Bytes implements Borrow<str> (and so does tungstenite::Utf8Bytes, which means that was probably an oversight).

wrto the name: Utf8Bytes seemed to me the most straightforward description of what it is, and the axum case is what spurred me to open this. ByteString seems like it could imply that std::string::String isn't composed of bytes (or that it's something like bstr; not necessarily utf8), and BytesString is a bit clunky with the geminated s. Str is simple like Bytes is, but doesn't match up with OsStr or CStr, which are slices and not owned buffers. However, I'm not dead set on anything, and would be totally fine with renaming it if that's desired.

@paolobarbolini
Copy link
Contributor

paolobarbolini commented Apr 16, 2025

FWIW I think this is a good idea. Other than tungstenite and http, which have already been mentioned above, I'll list a few more crates that have their own internal ByteStr type or equivalent to drive home the point:

  1. h2
  2. async-nats

Copy link
Contributor

@paolobarbolini paolobarbolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a small nitpick.

src/utf8.rs Outdated

/// Creates `Utf8Bytes` instance from slice, by copying it.
pub fn copy_from_slice(data: &str) -> Self {
String::from(data).into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call Bytes::copy_from_slice instead and avoid having more and more code rely on the internal representation of Bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Does copy_from_slice seem like a reasonable name here? I figured that the Bytes version was probably moreso referring to the borrowed nature of data, rather than the fact it's specifically a [_] slice type, so I used the same name here instead of something like copy_from_str.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants