Optimize string.Split(char, ...)#125379
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes string.Split(char, ...) by adding a dedicated fast path for the most common scenario: splitting on a single separator character, including a new vectorized implementation for scanning separator positions.
Changes:
- Added a single-separator (
separators.Length == 1) specialized path inMakeSeparatorListAny. - Introduced a new
MakeSeparatorListVectorized(..., char c)overload to vectorize scanning for a single separator char. - Adjusted the existing 2–3 separator path to assume
separators.Length >= 2(since the 1-separator case is now handled earlier).
This comment was marked as resolved.
This comment was marked as resolved.
…or non-existent, which is similar to what IndexOf does
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
What else needs to happen for this to be merged? I can revert 2871b1e & leave it for future is wanted, but figured we want to keep it so we get the char overload faster than string for this .NET version. |
It looked like explicitly handling the 1 character case wasn't a significant advantage over just optimizing the 3 character case based on the results? for in all cases, the variance for I think it would be fine, and roughly halve the size of the PR, to only take the 3 character case improvement (since that includes the 1/2 character case already) and then we can take another improvement to explicitly optimize the 1 character case in .NET 12 when we can avoid the duplication. The only test scenario that really meaningfully improved from |
This reverts commit 2871b1e.
|
I have reverted the handling @tannergooding - lmk if there's anything else I need to do :)
I don't think I'd agree with that, since without the 1 character handling it only beats the char overload on my computer & looses on other xarch platforms, and with it consistently beats it everywhere. The specific numbers clearly fluctuate quite a bit, but the pattern certainly seems to be consistent to me. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs:1147
- The comment in this Avx512BW PackSources helper says the inputs are
Vector256<short>, but this overload takesVector512<short>. This is misleading when reading/maintaining the packing logic used by other call sites.
internal static Vector512<byte> PackSources(Vector512<short> source0, Vector512<short> source1)
{
Debug.Assert(Avx512BW.IsSupported);
// Pack two vectors of characters into bytes. While the type is Vector256<short>, these are really UInt16 characters.
// X86: Downcast every character using saturation.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs:1147
- The comment in the
Vector512<byte> PackSourcesoverload refers toVector256<short>, which looks like a copy/paste error and is misleading when maintaining this code.
internal static Vector512<byte> PackSources(Vector512<short> source0, Vector512<short> source1)
{
Debug.Assert(Avx512BW.IsSupported);
// Pack two vectors of characters into bytes. While the type is Vector256<short>, these are really UInt16 characters.
// X86: Downcast every character using saturation.



In some cases
string.Split(char, ...)was slower (significantly) than the singlestringseperator overload - this PR optimizes thecharoverload/s to have better performance - the single-charoverloads are the most common case, so I have added the additional optimisation just for that for now - benchmarks below.BENCHMARKS ARE OUT OF DATE
Main benchmark that was the focus - single char splitter with rare/no seperators (~20% improvement on arm, ~47% improvement on x64) -
charoutperformsstringnow:X64 (AMD):

Arm:

Other cases have not regressed meaningfully (within margin of error).
There's still up to about 15% performance available for .NET 12: #125379 (comment).