Description
Proposal
Problem statement
Many methods of CStr
mention that methods like to_bytes
will eventually require a scan of the string, even though the string is currently represented internally as a slice and has efficient implementations now. However, no existing methods offer a way to access the underlying string in a future-compatible way.
Motivation, use-cases
In my particular case, I was trying to create a method that was generic over strings, byte slices, and C strings, and used iterators to achieve this. I noticed that CString
didn't have any of its own iterator types and wanted to rectify this.
Solution sketches
The proposal is to add the following API:
impl CStr {
pub fn bytes(&self) -> CStrBytes<'_> { /* ... */ }
}
pub struct CStrBytes<'a> {
ptr: NonNull<u8>,
phantom: PhantomData<&'a u8>,
}
impl<'a> CStrBytes<'a> {
pub fn as_c_str(&self) -> &'a CStr { unsafe { self.ptr.ref() } }
}
impl AsRef<CStr> for CStrBytes<'_> {
fn as_ref(&self) -> &CStr { unsafe { self.ptr.ref() } }
}
impl Iterator for CStrBytes<'_> {
type Item = u8;
/* ... */
}
impl FusedIterator for CStrBytes<'_> {}
Future extensions
In the future, an IntoIterator
impl could be added to &CStr
that returns this iterator, although that's left out for now.
Iterator trait impls
The proposal explicitly recommends not implementing DoubleEndedIterator
or ExactSizeIterator
since the assumption is that they will not be efficient for the long-term representation of C strings. Additionally, it recommends adding AsRef
and an as_c_str
method to mirror the API offered by slice::Iter
.
Item type
The iterator returns u8
to mirror the items returned by str::bytes
. Additionally, yielding &u8
might encourage callers to cast this to a pointer rather than using the as_c_str
method to give a reference to the string itself, which has methods dedicated to this purpose.
As an alternative, NonZeroU8
could be used instead, since we know we're not including the nul (see below). However, this would counteract the return value of to_bytes
, which may be less than desired. There's always room for a to_nonzero_bytes
method, though.
Nul byte
The iterator will not include the nul byte at the end of the string. From the perspective of including the bytes inside the string, that nul byte is not actually part of the string, merely past the end of it.
From an implementation perspective, including the nul byte could also be less efficient. Since the iterator will only store a single pointer, it cannot actually progress past the end of the string, since it would have no way of knowing without dereferencing that potentially invalid address. So, there would have to be either additional data inside the iterator indicating that it's "done," or the pointer could be nulled out when the iterator is done. In both cases, this would incur extra checks during iteration in addition to checking the byte itself to see if it's zero.
Additionally, if we were to null out the pointer after the iterator was finished, calling as_c_str
might have to return a reference to an empty string with an address outside the bounds of the original string. This isn't a super big deal, but it's another way it would differ from the slice::Iter
implementation.
Iterator type name
The additional ACP #134 is created in an attempt to resolve this issue.
If that ACP is not accepted, naming the type Bytes
and exporting it in the core::ffi
module would be unacceptable, since OsStr
could theoretically have a bytes
iterator in the future. So, the type is named CStrBytes
to enforce the fact that it's the bytes of a CStr
, and not an OsStr
.
Links and related work
ffi
submodule ACP: ACP:std::ffi::c
andstd::ffi::os
submodules #134- Implementation PR: Add CStr::bytes iterator rust#104353
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
Activity