Skip to content

Question on string.Split inconsistency between count = 1 and count > 1 #97920

@Guiorgy

Description

@Guiorgy

When splitting a string with the StringSplitOptions.RemoveEmptyEntries option enabled, and with count parameter greater than 1, on the last split the current implementation tries to recursively removie any empty entries at the start of the string (after trimming if the option is set) until a non-empty entry is found. It then returns the rest of the string as the last split.

The comment in the source also indicates this behaviour:

// The next iteration of the loop will provide the final entry into the
// results array. If needed, skip over all empty entries before that
// point.

For example:

var str = "abaac";
var splits = str.Split('a', 2, StringSplitOptions.RemoveEmptyEntries);
// "aac" is reduced to "c", since splitting at the first 'a' would result in ["", "ac"], and empty entries are removed.
//         "ac" is similarly reduced to "c".
// final result (splits): ["b", "c"]

However, when count parameter is passed as 1, it always short-circuits, and only checks if the whole string is empty (after trimming if the option is set), and returns the whole string. It doesn't attempt to remove empty splits at the start of the string recursively as in the above case.

For example:

var str = "abaac";
var splits = str.Split('a', 1, StringSplitOptions.RemoveEmptyEntries);
// str is not empty so the whole string is returned. No further action is taken.
// final result (splits): ["abaac"]

The comment in the source indicates that this is per the documentation:

// Per the method's documentation, we'll short-circuit the search for separators.
// But we still need to post-process the results based on the caller-provided flags.

Which according to learn.microsoft.com:

Remarks

If the string has already been split count - 1 times, but the end of the string has not been reached, then the last string in the returned array will contain this instance's remaining trailing substring, untouched.

If the last substring is the instance's remaining trailing substring, **untouched**, then why does the first case (count > 1) try to reduce it recursively? And if that is intentional, why does the second case (count = 1) not do the same?

This seems inconsistent.

I'd expect the results to either be both reduced recursively:

  • ["b", "c"]
  • ["baac"]

Or the last substring kept untouched in both cases:

  • ["b", "aac"]
  • ["abaac"]

My ultimate question is if this behaviour is completely expected and intended?

Edit: Another example to better illustrate:

var str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabaabaab";
var splits1 = str.Split('a', 1, StringSplitOptions.RemoveEmptyEntries);
var splits2 = str.Split('a', 2, StringSplitOptions.RemoveEmptyEntries);
var splits3 = str.Split('a', 3, StringSplitOptions.RemoveEmptyEntries);
// split1: ["aaaaaaaaaaaaaaaaaaaaaaaaaaaabaabaab"]
// split2: ["b", "baab"]
// split3: ["b", "b", "b"]

As you can see, changing count from 1 to 2 produces vastly different results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-System.Runtimeneeds-further-triageIssue has been initially triaged, but needs deeper consideration or reconsideration

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions