Description
Up front, thanks everyone who has worked on Rust for creating a fantastic language :-)
I tried this code:
fn run_case_char(s: &str, sep: char) {
println!("{:?} --- split_inclusive --> {:?}", s, s.split_inclusive(sep).collect::<Vec<_>>());
println!("{:?} --- split --> {:?}", s, s.split(sep).collect::<Vec<_>>());
}
fn main() {
run_case_char("xsys", 's');
run_case_char("xsy", 's');
run_case_char("xs", 's');
run_case_char("x", 's');
run_case_char("", 's');
}
I expected to see this happen: I expected the output of std::str::split_inclusive
to be identical to that of std::str::split
except with the separator included. In particular, I expected the same number of items in the iterator. The precise output was:
"xsys" --- split_inclusive --> ["xs", "ys"]
"xsys" --- split --> ["x", "y", ""]
"xsy" --- split_inclusive --> ["xs", "y"]
"xsy" --- split --> ["x", "y"]
"xs" --- split_inclusive --> ["xs"]
"xs" --- split --> ["x", ""]
"x" --- split_inclusive --> ["x"]
"x" --- split --> ["x"]
"" --- split_inclusive --> []
"" --- split --> [""]
Instead, this happened: In the calls to std::str::split_inclusive
, if the last substring was the empty string, it was not included in the result. This was extra surprising when the input string was the empty string, in which case the resulting iterator has no elements.
I see an explanation of this behavior under the examples section of the documentation for std::str::split_inclusive
: "If the last element of the string is matched, that element will be considered the terminator of the preceding substring. That substring will be the last item returned by the iterator." However, this seems to contradict the definitional description: "An iterator over substrings of this string slice, separated by characters matched by a pattern. Differs from the iterator produced by split in that split_inclusive leaves the matched part as the terminator of the substring."
Anyway, a concrete example of why I think the empty string should not be ignored at the end is producing a contiguous segmentation of a string into newline-terminated lines that agrees with the line count. The line count is 1 plus the number of newlines in the string, and the last line may well be the empty string, but it's no less valid as a line.
Looking at the source, I see that the implementation of the method is
pub fn split_inclusive<'a, P: Pattern<'a>>(&'a self, pat: P) -> SplitInclusive<'a, P> {
SplitInclusive(SplitInternal {
start: 0,
end: self.len(),
matcher: pat.into_searcher(self),
allow_trailing_empty: false,
finished: false,
})
}
and in particular, the presence of allow_trailing_empty
implies that in principle either behavior could be specified easily, though obviously that's hidden behind the private type SplitInternal.
Anyway, I realize that it's probably not feasible to change the behavior of the existing method. I would be in favor of adding the ability to specify allow_trailing_empty
somehow.
Meta
rustc --version --verbose
:
rustc 1.67.1 (d5a82bbd2 2023-02-07)
binary: rustc
commit-hash: d5a82bbd26e1ad8b7401f6a718a9c57c96905483
commit-date: 2023-02-07
host: x86_64-unknown-linux-gnu
release: 1.67.1
LLVM version: 15.0.6