Skip to content

Optimize string.Split(char, ...)#125379

Merged
tannergooding merged 37 commits into
dotnet:mainfrom
hamarb123:main33
Jul 5, 2026
Merged

Optimize string.Split(char, ...)#125379
tannergooding merged 37 commits into
dotnet:mainfrom
hamarb123:main33

Conversation

@hamarb123

@hamarb123 hamarb123 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

In some cases string.Split(char, ...) was slower (significantly) than the single string seperator overload - this PR optimizes the char overload/s to have better performance - the single-char overloads are the most common case, so I have added the additional optimisation just for that for now - benchmarks below.

BENCHMARKS ARE OUT OF DATE

Main benchmark that was the focus - single char splitter with rare/no seperators (~20% improvement on arm, ~47% improvement on x64) - char outperforms string now:

X64 (AMD):
image

Arm:
image

Other cases have not regressed meaningfully (within margin of error).

There's still up to about 15% performance available for .NET 12: #125379 (comment).

Copilot AI review requested due to automatic review settings March 10, 2026 12:26
@hamarb123 hamarb123 marked this pull request as draft March 10, 2026 12:26
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes string.Split(char, ...) by adding a dedicated fast path for the most common scenario: splitting on a single separator character, including a new vectorized implementation for scanning separator positions.

Changes:

  • Added a single-separator (separators.Length == 1) specialized path in MakeSeparatorListAny.
  • Introduced a new MakeSeparatorListVectorized(..., char c) overload to vectorize scanning for a single separator char.
  • Adjusted the existing 2–3 separator path to assume separators.Length >= 2 (since the 1-separator case is now handled earlier).

@hamarb123

This comment was marked as resolved.

@hamarb123

hamarb123 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor Author

Seems to improve perf by ~25%, but there's still a substantial gap on X86 (would need another ~40% improvement to match string).

AMD:
image

Arm:
image

Intel:
image

…or non-existent, which is similar to what IndexOf does
@hamarb123

This comment was marked as outdated.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.

Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
@hamarb123

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings March 10, 2026 14:12
@hamarb123

This comment was marked as outdated.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
@hamarb123 hamarb123 requested a review from Copilot March 10, 2026 14:26
Copilot AI review requested due to automatic review settings June 27, 2026 08:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@hamarb123

This comment was marked as resolved.

@hamarb123

This comment was marked as resolved.

@hamarb123

Copy link
Copy Markdown
Contributor Author

What else needs to happen for this to be merged? I can revert 2871b1e & leave it for future is wanted, but figured we want to keep it so we get the char overload faster than string for this .NET version.

@tannergooding

Copy link
Copy Markdown
Member

I can revert 2871b1e & leave it for future is wanted, but figured we want to keep it so we get the char overload faster than string for this .NET version.

It looked like explicitly handling the 1 character case wasn't a significant advantage over just optimizing the 3 character case based on the results?

for abc,def it was 18.7ns vs 17.2ns
for abcdef it was 7.9ns vs 7.7ns
for test(...)tttt it was 10.5ns vs 8.6ns
for ttt,(...)asds it was 4045.2ns vs 3852.3ns`

in all cases, the variance for main across the two runs and the error listed shows they are essentially within noise of eachother

I think it would be fine, and roughly halve the size of the PR, to only take the 3 character case improvement (since that includes the 1/2 character case already) and then we can take another improvement to explicitly optimize the 1 character case in .NET 12 when we can avoid the duplication.

The only test scenario that really meaningfully improved from main was the SplitByChar and SplitByChars case for test(...)tttt where it went from 30.11ns to 14.1ns and from 29.27ns to 13.33ns. The other cases all stayed roughly the same, within the general noise margins and run to run fluctuations (i.e. ttt,(...)asds is actually slower for SplitByChar in the run that includes the single-character case, but this is likely due to loop alignment or some other run specific noise introduction, it fluctuates quite a bit here).

Copilot AI review requested due to automatic review settings July 4, 2026 03:54
@hamarb123

Copy link
Copy Markdown
Contributor Author

I have reverted the handling @tannergooding - lmk if there's anything else I need to do :)

It looked like explicitly handling the 1 character case wasn't a significant advantage over just optimizing the 3 character case based on the results?

I don't think I'd agree with that, since without the 1 character handling it only beats the char overload on my computer & looses on other xarch platforms, and with it consistently beats it everywhere. The specific numbers clearly fluctuate quite a bit, but the pattern certainly seems to be consistent to me.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs:1147

  • The comment in this Avx512BW PackSources helper says the inputs are Vector256<short>, but this overload takes Vector512<short>. This is misleading when reading/maintaining the packing logic used by other call sites.
        internal static Vector512<byte> PackSources(Vector512<short> source0, Vector512<short> source1)
        {
            Debug.Assert(Avx512BW.IsSupported);
            // Pack two vectors of characters into bytes. While the type is Vector256<short>, these are really UInt16 characters.
            // X86: Downcast every character using saturation.

Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs Outdated
Copilot AI review requested due to automatic review settings July 4, 2026 03:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs:1147

  • The comment in the Vector512<byte> PackSources overload refers to Vector256<short>, which looks like a copy/paste error and is misleading when maintaining this code.
        internal static Vector512<byte> PackSources(Vector512<short> source0, Vector512<short> source1)
        {
            Debug.Assert(Avx512BW.IsSupported);
            // Pack two vectors of characters into bytes. While the type is Vector256<short>, these are really UInt16 characters.
            // X86: Downcast every character using saturation.

Copilot AI review requested due to automatic review settings July 4, 2026 04:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@tannergooding tannergooding merged commit e9914e4 into dotnet:main Jul 5, 2026
139 of 145 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants