Skip to content

Implement further str split variants via a builder API #168

Closed
@pitaj

Description

@pitaj

Proposal

Problem statement

There are multiple "options" provided one the str type:

  • reverse direction (rsplit...)
  • inclusive of the separator (split_inclusive)
  • terminated instead of separated (split_terminator, rsplit_terminator)
  • limited number of splits allowed (splitn, rsplitn)
  • split exactly once into a tuple (split_once, rsplit_once)

Currently, various useful combination of these options are missing from the str type (non-exhaustive):

  • inclusive + reversed (rsplit_inclusive)
  • inclusive + limited (splitn_inclusive, rsplitn_inclusive)
  • inclusive + once (split_inclusive_once, rsplit_inclusive_once)

Additionally, there may be demand for before- and after-inclusive versions (split_inclusive is after-inclusive).

Adding all of the combinations would quickly balloon the API surface of str.
The combinatorial explosion would only get worse if more options are added.

Motivation, use-cases

A comment on a PR adding more combinations was the inspiration for this proposal:

We're hesistant to add this many similar types and functions.
...
An idea that was brought up is to not add more iterator types, but to add new methods to the existing ones, to cover most/all of the functionality this PR adds.

Example: rsplit_inclusive

"a+b+c".split_inclusive('+').collect::<Vec<_>>() // => vec!["a+", "b+", "c"]
// reversing the iterator is not enough
"a+b+c".split_inclusive('+').rev().collect::<Vec<_>>() // => vec!["c", "b+", "a+"]
// but some desire vec!["+c", "+b", "a"]

Example: splitn_inclusive

let v: Vec<&str> = "Mary had a little lambda".splitn_inclusive(' ').collect();
assert_eq!(v, ["Mary ", "had ", "a little lambda"]);

Example: split_inclusive_once

assert_eq!("cfg=foo=bar".split_inclusive_once('='), Some(("cfg=", "foo=bar")));

Solution sketches

My proposal is to add a builder API to the existing Split type. This consists of a handful of functions that modify the splitting behavior:

impl<'a, P: Pattern<'a>> Split<'a, P> {
    // `s.split(p).once()` acts like `s.split_once(p)`
    pub fn once(self) -> Option<(&'a str, &'a str)>
    // `s.split(p).to_terminated()` acts like `s.split_terminator(p)`
    pub fn to_terminated(self) -> Self
    // `s.split(p).to_inclusive()` acts like `s.split_inclusive(p)`
    pub fn to_inclusive(self) -> Inclusive<Self>
    // `s.split(p).to_reversed()` acts like `s.rsplit(p)`
    pub fn to_reversed(self) -> Reversed<Self>
    where
         P::Searcher: ReverseSearcher<'a>
    // `s.split(p).with_limit()` acts like `s.splitn(p)`
    pub fn with_limit(self) -> Limited<Self>
}

And these modifiers can be combined to produce any of the existing splitting functions:

Existing fn Builder chain
split(pat) split(pat)
split_inclusive(pat) split(pat).to_inclusive()
rsplit(pat) split(pat).to_reversed()
split_terminator(pat) split(pat).to_terminated()
rsplit_terminator(pat) split(pat).to_terminated().to_reversed()
split(pat).to_reversed().to_terminated()
splitn(n, pat) split(pat).with_limit(n)
rsplitn(n, pat) split(pat).with_limit(n)
split_once(pat) split(pat).once()
rsplit_once(pat) split(pat).to_reversed().once()

Plus more that aren't currently available:

Imaginary fn Builder chain
rsplit_inclusive(pat) split(pat).to_inclusive().to_reversed()
split_inclusive_once(pat) split(pat).to_inclusive().once()
rsplit_inclusive_once(pat) split(pat).to_inclusive().to_reversed().once()
splitn_inclusive(n, pat) split(pat).to_inclusive().with_limit(n)
rsplitn_inclusive(n, pat) split(pat).to_inclusive().to_reversed().with_limit(n)

All of the above (with the exception of once variants) return a type that implements Iterator, DoubleEndedIterator, etc as the existing functions do. The difference is that the type returned is not a standalone struct, but instead a combinator of generic structs. For instance, split(pat).to_reversed() returns Reversed<Split<'a, P>>, split(pat).to_inclusive().to_reversed() returns Reversed<Inclusive<Split<'a, P>>>, etc.

A proof of concept is available in the str_splitter crate. When desired, I can quickly put forward an implementation PR, since the base code there is directly from the standard library source.

This approach has several benefits:

  • no combinatorial explosion of fns or types (just one combinator struct for each modifier)
  • easily extensible with any future modifiers (initiator vs terminator, including the sep before vs after)
  • produces fully useable iterators, just like the existing functions
  • builders are familiar to users of the Rust standard library and ecosystem at large

Alternate solutions

One alternate solution put forward in the aforementioned PR comment is adding extra next functions on the Split struct that will each return the next substring of the given modifier:

For example, std::slice::Split could have a method or two like .next_with_separator() or .next_separator() or .next_including_separator() or something in that direction. We didn't discuss this option in detail, but we'd like to see some exploration of alternatives that do not involve adding so many similar functions and types. An alternative with fewer functions and types might also be easier to document, and easier to learn and use for users.

The downside of this approach is that those would not be usable as iterators in their own right. Additionally, adding more orthogonal modifiers would result in the same combinatorial explosion (though to a lesser degree) of functions on the Split type.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions