Skip to content

ACP: add str::chunks, str::chunks_exact, and str::windows #590

@folkertdev

Description

@folkertdev

Proposal

Problem statement

The slice::chunks, slice::chunks_exact, and slice::windows functions do no exist on str. This is inconsistent with the ability to index a &str by a range. If that is allowed, then intuitively these iterators should also be.

Motivating examples or use cases

This recently came up in the rustweek python FFI workshop exercises:

fn count_ngrams(sequence: &str, k: usize) -> HashMap<&str, usize> {
    let mut ngrams = HashMap::default();

    for window in sequence.as_bytes().windows(k) {
        let key = std::str::from_utf8(window).unwrap();
        *ngrams.entry(key).or_insert(0) += 1;
    }

    ngrams
}

Solution sketch

The str type (and core::str module) should provide the windows, chunks and chunks_exact functions, so that we can instead write:

fn count_ngrams(sequence: &str, k: usize) -> HashMap<&str, usize> {
    let mut ngrams = HashMap::default();

    for key in sequence.windows(k) {
        *ngrams.entry(key).or_insert(0) += 1;
    }

    ngrams
}

The iterator will panic if it tries to yield a subslice that is not valid utf-8, roughly:

struct Windows<'a>(core::slice::Windows<'a, u8>);

impl<'a> Iterator for Windows<'a> {
    type Item = &'a str;

    #[track_caller]
    fn next(&mut self) -> Option<Self::Item> {
        match core::str::from_utf8(self.0.next()?) {
            Ok(subslice) => Some(subslice),
            Err(e) => panic!("{:?}", e),
        }
    }
}

So that

#[test]
fn foo() {
    let mut it = Windows("x🦀".as_bytes().windows(1));

    dbg!(it.next());
    dbg!(it.next());
}

Prints something like

[src/main.rs:19:5] it.next() = Some(
    "x",
)

thread 'foo' panicked at src/main.rs:20:13:
Utf8Error { valid_up_to: 0, error_len: None }

However, we can probably do a better job for the error, similar to &"x🦀"[1..2] showing:

thread 'foo' panicked at src/main.rs:20:16:
byte index 2 is not a char boundary; it is inside '🦀'  (bytes 1..5) of `x🦀`

Alternatives

Just have users do this manually.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions