Skip to content

Conversation

2A5F
Copy link

@2A5F 2A5F commented May 29, 2025

Modified ArraySortHelper and added ArraySortHelper<T, TComparer> to allow generic struct TComparer without memory allocation

Span.Sort<T, TComparer> will not box now

#39466
#39543

@jkotas
Copy link
Member

jkotas commented May 31, 2025

@EgorBot -intel

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    Comparison<int> c = (int x, int y) => x - y;

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, c);
}

@jkotas jkotas added area-System.Memory tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 31, 2025
@jkotas
Copy link
Member

jkotas commented May 31, 2025

Could you please share perf numbers that demonstrate the performance improvements?

The one micro-benchmark that I have kicked off above shows ~10% regression: EgorBot/runtime-utils#370 (comment) .

@2A5F
Copy link
Author

2A5F commented May 31, 2025

No, I haven't done any performance testing.
Maybe passing TComparer by ref causes performance overhead.
Can this test bot output jit diffs?

@2A5F 2A5F marked this pull request as draft May 31, 2025 09:29
@adamsitnik
Copy link
Member

@2A5F You can use the benchmarks from the performance repo (we use them for ensuring there are no regressions):

https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs#L53-L54

You can both trace files and disassembler output:

https://github.com/dotnet/performance/tree/main/src/benchmarks/micro#quick-start

@2A5F 2A5F force-pushed the avoid-sort-allocations branch from c46122b to 1b22a69 Compare June 2, 2025 15:07
@2A5F
Copy link
Author

2A5F commented Jun 2, 2025

@adamsitnik Thanks for the notice
@jkotas After testing, I think it is impossible to unify the generic paths without regression, so for now I will only use generic in the Span.Sort<TComparer>.


Test Code

base on https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs

[Benchmark]
public void Span_ComparerStruct() => _arrays[_iterationIndex++].AsSpan(0, Size).Sort(new ComparableComparerStruct());

private readonly struct ComparableComparerStruct : IComparer<T>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(T x, T y) => x.CompareTo(y);
}

All test on

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 10 (10.0.20348.1906) (VMware)
AMD Ryzen 9 7950X 4.49GHz, 16 CPU, 32 logical and 32 physical cores
.NET SDK 10.0.100-preview.6.25281.103
  [Host]     : .NET 10.0.0 (10.0.25.28203), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-VBHUQD : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BXQXFN : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=5000  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1

Int32

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.562 μs 0.1579 μs 0.1819 μs 3.547 μs 3.312 μs 3.931 μs 0.30 0.02 - 1,126 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.943 μs 0.2768 μs 0.3188 μs 11.825 μs 11.523 μs 12.474 μs 1.00 0.04 - 1,910 B 88 B 1.00

BigStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 4.677 μs 0.1230 μs 0.1417 μs 4.645 μs 4.362 μs 4.949 μs 0.34 0.01 - 1,352 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 13.659 μs 0.2965 μs 0.3173 μs 13.594 μs 13.215 μs 14.288 μs 1.00 0.03 - 2,426 B 88 B 1.00

IntClass

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 23.89 μs 0.541 μs 0.601 μs 23.92 μs 22.59 μs 24.86 μs 0.92 0.03 - 2,958 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 26.00 μs 0.519 μs 0.556 μs 26.01 μs 24.97 μs 26.85 μs 1.00 0.03 - 968 B 88 B 1.00

IntStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.054 μs 0.1060 μs 0.1221 μs 3.006 μs 2.880 μs 3.293 μs 0.26 0.01 - 1,102 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.784 μs 0.2308 μs 0.2159 μs 11.787 μs 11.312 μs 12.174 μs 1.00 0.03 - 1,916 B 88 B 1.00

String

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 153.6 μs 1.99 μs 1.76 μs 153.6 μs 150.0 μs 156.8 μs 1.00 0.02 - 2,951 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 154.3 μs 2.19 μs 2.05 μs 153.6 μs 151.8 μs 158.0 μs 1.00 0.02 - 968 B 88 B 1.00

@2A5F
Copy link
Author

2A5F commented Jun 2, 2025

Ideas and todo:

  1. Consider get a pointer to interface method to simulate a value type delegate
  2. Delegate struct wrapper has 1% regression (this test result may be invalid), maybe this is acceptable, or try take out the pointer and rewrap it
  3. Null ICompear<T> path can replace to Compear<T>.Default to avoid interface virtual call

  1. Not feasible, C# does not have syntax to get instance method pointers. Even if use InlineIL.Fody to get it, the benchmark is not as good as the delegate.
  2. Since 1 is not feasible, then 2 is also not feasible.

@andrewjsaid
Copy link
Contributor

To avoid duplicate code, could you do something like this:

struct DelegateWrappedComparer<T> : IComparer<T>
{
    private Comparer<T> _comparer;

    public int Compare(T? x, T? y) => _comparer.Compare(x, y);
}

And point the "old" version of the method to the new one, with this wrapper?

Hope this is clear.

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

@andrewjsaid For Comparer<T>, I'm not sure if this makes sense. Perhaps a struct wrapper can convert an interface virtual call into an abstract class virtual call; however public apis uses only IComparer<T> not Comparer<T>. And Comparer<T>.Default have some jit magic (I don't know what it is, but the test results are like this) so dose not need struct wrapper.


Delegate is Comparison<T> and the test results have regression EgorBot/runtime-utils#370 (comment)


This test has expired/is invalid. I will retest the new code later when I have time.

(benchmarks are really slow)

@andrewjsaid
Copy link
Contributor

andrewjsaid commented Jun 3, 2025

My reasoning was that by using the struct wrapper, methods like the below would be specialized for that struct and the JIT would be able to inline the call to DelegateWrappedComparer<T>.CompareTo thus making the SwapIfGreater<TComparer>(...) and SwapIfGreater(...)` equivalent.

private static void SwapIfGreater<TComparer>(Span<T> keys, TComparer comparer, int i, int j)
  where TComparer : IComparer<T>, allows ref struct

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer<T> this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

#39466 (comment)

@andrewjsaid
Copy link
Contributor

Oops already suggested 🤦
Too many issues / PRs with similar names I missed that one

Thanks for the link

@tarekgh
Copy link
Member

tarekgh commented Sep 23, 2025

@2A5F I assume the following case is mainly the case we are improving in IntStruct

Span_ComparerStruct	Job-DLLHDC	pr	    512	2.805  μs	0.0819 μs   0.23	1,102 B	-     0.00
Span_ComparerStruct	Job-KCPLYN	main	512	12.106 μs	0.2035 μs	1.00    1,916 B  88 B 1.00

@jkotas you were asking before about the benchmark numbers which is here now #116109 (comment). Any feedback? I looked at the changes and it looks good to me as mostly we are mimicking the non-value type case.

if (span.Length > 1)
{
ArraySortHelper<T>.Default.Sort(span, comparer); // value-type comparer will be boxed
if (typeof(TComparer).IsValueType)
Copy link
Member

@jkotas jkotas Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well only when:

  • TComparer is a small struct. If TComparer is a larger struct, this change is going to be a performance regression.
  • TComparer implementation does not carry state like cached information. If it is not the case, this change is going to be an observable behavior change.

For example, the following program is going to print "Too many cache misses" after this change:

using System;
using System.Collections.Generic;

int[] a = new int[10000];
a.AsSpan().Sort(new CachingComparer<string>());

struct CachingComparer<T> : IComparer<int>
{
    static int cacheMisses;

    bool cached;

    public int Compare(int x, int y)
    {
        if (!cached)
        {
            ... code to cache something useful ...
            cacheMisses++;
            if (cacheMisses > 1000) Console.WriteLine("Too many cache misses");
            cached = true;
        }

        return x-y;
    }
}

I think it is ok to assume that TComparer implementation is well-behaving and that it does not have these issues. I am just pointing out the potential risks.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2025

@EgorBot -intel

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    int[] b = new int[10000];
    MyComparer c = new MyComparer();

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, b, c);

    class MyComparer : IComparer<int>
    {
        public int Compare(int x, int y) => x-y;
    }   
}

@jkotas
Copy link
Member

jkotas commented Sep 24, 2025

The microbenchmark that I have posted shows 1.27x regression.

@tarekgh
Copy link
Member

tarekgh commented Sep 25, 2025

The microbenchmark that I have posted shows 1.27x regression.

Considering that, should we close this PR? or is there any suggestion to improve the regressed case?

@2A5F 2A5F force-pushed the avoid-sort-allocations branch from 646f4d0 to 6c02cb3 Compare September 25, 2025 04:26
@dotnet-policy-service dotnet-policy-service bot added the linkable-framework Issues associated with delivering a linker friendly framework label Sep 25, 2025
@2A5F 2A5F force-pushed the avoid-sort-allocations branch from 6c02cb3 to 25b1914 Compare September 25, 2025 04:32
@2A5F
Copy link
Author

2A5F commented Sep 25, 2025

I accidentally pushed to the wrong branch; the tags and review may need to be adjusted.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2025

Considering that, should we close this PR? or is there any suggestion to improve the regressed case?

The throughput should be fixable by introducing even more code duplication. (A separate question is whether the code duplication is worth it.)

}
else
{
ArraySortHelper<T>.Default.Sort(span, comparer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this path allocation-free, or does it still have allocations from the IComparer.Compare delegate? (I am not able to tell by looking at the code - want to make sure that we understand what we are getting.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this path still use delegate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Memory community-contribution Indicates that the PR has been added by a community member linkable-framework Issues associated with delivering a linker friendly framework tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants