Skip to content

ACP: CStr::bytes iterator #135

Closed
Closed
@clarfonthey

Description

Proposal

Problem statement

Many methods of CStr mention that methods like to_bytes will eventually require a scan of the string, even though the string is currently represented internally as a slice and has efficient implementations now. However, no existing methods offer a way to access the underlying string in a future-compatible way.

Motivation, use-cases

In my particular case, I was trying to create a method that was generic over strings, byte slices, and C strings, and used iterators to achieve this. I noticed that CString didn't have any of its own iterator types and wanted to rectify this.

Solution sketches

The proposal is to add the following API:

impl CStr {
    pub fn bytes(&self) -> CStrBytes<'_> { /* ... */ }
}
pub struct CStrBytes<'a> {
    ptr: NonNull<u8>,
    phantom: PhantomData<&'a u8>,
}
impl<'a> CStrBytes<'a> {
    pub fn as_c_str(&self) -> &'a CStr { unsafe { self.ptr.ref() } }
}
impl AsRef<CStr> for CStrBytes<'_> {
    fn as_ref(&self) -> &CStr { unsafe { self.ptr.ref() } }
}
impl Iterator for CStrBytes<'_> {
    type Item = u8;

    /* ... */
}
impl FusedIterator for CStrBytes<'_> {}

Future extensions

In the future, an IntoIterator impl could be added to &CStr that returns this iterator, although that's left out for now.

Iterator trait impls

The proposal explicitly recommends not implementing DoubleEndedIterator or ExactSizeIterator since the assumption is that they will not be efficient for the long-term representation of C strings. Additionally, it recommends adding AsRef and an as_c_str method to mirror the API offered by slice::Iter.

Item type

The iterator returns u8 to mirror the items returned by str::bytes. Additionally, yielding &u8 might encourage callers to cast this to a pointer rather than using the as_c_str method to give a reference to the string itself, which has methods dedicated to this purpose.

As an alternative, NonZeroU8 could be used instead, since we know we're not including the nul (see below). However, this would counteract the return value of to_bytes, which may be less than desired. There's always room for a to_nonzero_bytes method, though.

Nul byte

The iterator will not include the nul byte at the end of the string. From the perspective of including the bytes inside the string, that nul byte is not actually part of the string, merely past the end of it.

From an implementation perspective, including the nul byte could also be less efficient. Since the iterator will only store a single pointer, it cannot actually progress past the end of the string, since it would have no way of knowing without dereferencing that potentially invalid address. So, there would have to be either additional data inside the iterator indicating that it's "done," or the pointer could be nulled out when the iterator is done. In both cases, this would incur extra checks during iteration in addition to checking the byte itself to see if it's zero.

Additionally, if we were to null out the pointer after the iterator was finished, calling as_c_str might have to return a reference to an empty string with an address outside the bounds of the original string. This isn't a super big deal, but it's another way it would differ from the slice::Iter implementation.

Iterator type name

The additional ACP #134 is created in an attempt to resolve this issue.

If that ACP is not accepted, naming the type Bytes and exporting it in the core::ffi module would be unacceptable, since OsStr could theoretically have a bytes iterator in the future. So, the type is named CStrBytes to enforce the fact that it's the bytes of a CStr, and not an OsStr.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions