Skip to content

RUST-1992 Introduce the &CStr and CString types for keys and regular expressions #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

abr-egn
Copy link
Contributor

@abr-egn abr-egn commented Jun 27, 2025

RUST-1992

This introduces the &CStr and CString types; these are zero-overhead equivalents to &str and String that witness that the text contain no zero bytes. These types are used to enforce that zero-byte checking is done for regular expressions and value keys at construction time (i.e. load or user input) rather than at encoding, which means (a) errors will happen closer to the root cause and (b) the encoding machinery can be simplified.

The new types are made fairly easy to work with via implementation of a swath of standard library traits and a cstr! macro that checks at compile-time if a given string literal is valid and errors with a friendly message if not.

@abr-egn abr-egn changed the title RUST-1992 Introduce the &CStr and CString types for regular expressions RUST-1992 Introduce the &CStr and CString types for keys and regular expressions Jun 27, 2025
doc_buf.append("number", 12).unwrap();
doc_buf.append("bool", false).unwrap();
doc_buf.append("nu", RawBson::Null).unwrap();
doc_buf.append(cstr!("a"), "key");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really is the whole story in this one line - the failure point has been shifted from writing bytes to the document buffer (append and everything that used it) to constructing the string, and that in turn can now be done either at run-time or at compile-time if it's just a literal.

("a key", RawBson::String("a value".to_string())),
("an objectid", RawBson::ObjectId(oid)),
("a date", RawBson::DateTime(dt)),
(cstr!("a key"), RawBson::String("a value".to_string())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building from a key-value list is one place where the repeated cstr! is more of a hassle; I could potentially see providing a convenience try_from_iter method that accepts &str keys and returns a Result (i.e. like this used to be).

// Insert the current entry followed by trailing comma.
(@object $object:ident [$($key:tt)+] ($value:expr) , $($rest:tt)*) => {
$object.append(($($key)+), $value).expect("invalid bson value");
// Insert the current entry with followed by trailing comma, with a key literal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the behavior of rawdoc! a bit here - now if the key is a literal it'll be implicitly wrapped in cstr! so it gets compile-time validated, otherwise it'll be assumed to be an expression that evaluates to a valid key and passed on to append (as before). The main difference here is that now this macro can no longer panic :)

}
}

impl<B: BindRawBsonRef> FromIterator<B> for RawArrayBuf {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets to be a real impl again 🎉

@@ -256,7 +256,13 @@ impl<'a> RawBsonRef<'a> {
RawBsonRef::Boolean(b) => RawBson::Boolean(b),
RawBsonRef::Null => RawBson::Null,
RawBsonRef::RegularExpression(re) => {
RawBson::RegularExpression(Regex::new(re.pattern, re.options))
let mut chars: Vec<_> = re.options.as_str().chars().collect();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't use Regex::from_strings because it's coming from already-validated data, so it can skip the extra validation step that would add.

@abr-egn abr-egn marked this pull request as ready for review June 28, 2025 00:27
@abr-egn abr-egn requested a review from a team as a code owner June 28, 2025 00:27
@abr-egn abr-egn requested a review from isabelatkinson June 28, 2025 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant