-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
When splitting a string with the StringSplitOptions.RemoveEmptyEntries
option enabled, and with count
parameter greater than 1, on the last split the current implementation tries to recursively removie any empty entries at the start of the string (after trimming if the option is set) until a non-empty entry is found. It then returns the rest of the string as the last split.
The comment in the source also indicates this behaviour:
// The next iteration of the loop will provide the final entry into the
// results array. If needed, skip over all empty entries before that
// point.
For example:
var str = "abaac";
var splits = str.Split('a', 2, StringSplitOptions.RemoveEmptyEntries);
// "aac" is reduced to "c", since splitting at the first 'a' would result in ["", "ac"], and empty entries are removed.
// "ac" is similarly reduced to "c".
// final result (splits): ["b", "c"]
However, when count
parameter is passed as 1, it always short-circuits, and only checks if the whole string is empty (after trimming if the option is set), and returns the whole string. It doesn't attempt to remove empty splits at the start of the string recursively as in the above case.
For example:
var str = "abaac";
var splits = str.Split('a', 1, StringSplitOptions.RemoveEmptyEntries);
// str is not empty so the whole string is returned. No further action is taken.
// final result (splits): ["abaac"]
The comment in the source indicates that this is per the documentation:
// Per the method's documentation, we'll short-circuit the search for separators.
// But we still need to post-process the results based on the caller-provided flags.
Which according to learn.microsoft.com:
Remarks
If the string has already been split count - 1 times, but the end of the string has not been reached, then the last string in the returned array will contain this instance's remaining trailing substring, untouched.
If the last substring is the instance's remaining trailing substring, **untouched**
, then why does the first case (count > 1) try to reduce it recursively? And if that is intentional, why does the second case (count = 1) not do the same?
This seems inconsistent.
I'd expect the results to either be both reduced recursively:
["b", "c"]
["baac"]
Or the last substring kept untouched in both cases:
["b", "aac"]
["abaac"]
My ultimate question is if this behaviour is completely expected and intended?
Edit: Another example to better illustrate:
var str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabaabaab";
var splits1 = str.Split('a', 1, StringSplitOptions.RemoveEmptyEntries);
var splits2 = str.Split('a', 2, StringSplitOptions.RemoveEmptyEntries);
var splits3 = str.Split('a', 3, StringSplitOptions.RemoveEmptyEntries);
// split1: ["aaaaaaaaaaaaaaaaaaaaaaaaaaaabaabaab"]
// split2: ["b", "baab"]
// split3: ["b", "b", "b"]
As you can see, changing count
from 1
to 2
produces vastly different results.