System.Linq Orderby performance optimization #46414

johnthcall · 2020-12-27T04:55:54Z

When performing OrderBy an int[] map is used to track the order of the items in the buffer. In the cases when there isn't a secondary sort however the items in the buffer can be sorted in place using ArraySortHelper which takes a Span of keys and a separate Span of items. In GetEnumerator and ToList this removes the need for the int[] map allocation and in ToArray it reuses the buffer array instead of allocating a new one.
The two open issues are:

Is there a better way to decide when to apply this optimization.
Whether classes should be allowed, for lower length it is slightly slower but has significantly lower memory allocation for ToArray
https://github.com/dotnet/performance/compare/master...johnthcall:OrderBySingleSort?expand=1

Method	Toolchain	N	Mean	Error	Ratio	Allocated
StringToList	pr	10	2,558.9 ns	135.43 ns	1.00	488 B
StringToList	upstream	10	2,524.6 ns	144.37 ns	0.99	616 B

StringToArray	pr	10	2,503.1 ns	102.37 ns	1.00	352 B
StringToArray	upstream	10	2,516.3 ns	100.05 ns	1.01	584 B

LongToList	pr	10	244.6 ns	11.17 ns	1.00	456 B
LongToList	upstream	10	381.1 ns	22.69 ns	1.56	584 B

LongToArray	pr	10	185.5 ns	8.24 ns	1.00	320 B
LongToArray	upstream	10	368.7 ns	16.02 ns	1.99	552 B

ByteToList	pr	10	208.5 ns	7.97 ns	1.00	264 B
ByteToList	upstream	10	281.9 ns	14.82 ns	1.35	392 B

ByteToArray	pr	10	162.1 ns	10.05 ns	1.00	192 B
ByteToArray	upstream	10	265.9 ns	12.39 ns	1.65	360 B

StringToList	pr	200	93,643.9 ns	3,403.92 ns	1.00	5048 B
StringToList	upstream	200	97,151.2 ns	4,114.54 ns	1.04	5936 B

StringToArray	pr	200	94,181.4 ns	4,961.81 ns	1.00	3392 B
StringToArray	upstream	200	94,923.1 ns	3,305.63 ns	1.02	5904 B

LongToList	pr	200	2,787.4 ns	121.50 ns	1.00	5016 B
LongToList	upstream	200	13,183.1 ns	494.75 ns	4.76	5904 B

LongToArray	pr	200	2,635.8 ns	90.17 ns	1.00	3360 B
LongToArray	upstream	200	13,197.0 ns	598.67 ns	5.01	5872 B

ByteToList	pr	200	2,565.9 ns	116.72 ns	1.00	816 B
ByteToList	upstream	200	12,553.4 ns	538.90 ns	4.90	1704 B

ByteToArray	pr	200	2,501.8 ns	158.85 ns	1.00	560 B
ByteToArray	upstream	200	12,389.3 ns	617.73 ns	4.97	1672 B

StringToList	pr	3000	2,445,032.7 ns	95,625.27 ns	1.00	72249 B
StringToList	upstream	3000	2,445,737.2 ns	118,153.78 ns	1.00	84337 B

StringToArray	pr	3000	2,435,115.9 ns	128,939.81 ns	1.00	48193 B
StringToArray	upstream	3000	2,482,575.2 ns	101,295.32 ns	1.03	84305 B

LongToList	pr	3000	147,456.0 ns	6,842.61 ns	1.00	72216 B
LongToList	upstream	3000	347,817.5 ns	18,424.76 ns	2.36	84304 B

LongToArray	pr	3000	142,437.5 ns	5,539.96 ns	1.00	48160 B
LongToArray	upstream	3000	358,586.4 ns	17,107.91 ns	2.53	84272 B

ByteToList	pr	3000	125,136.6 ns	6,217.83 ns	1.00	9216 B
ByteToList	upstream	3000	366,918.6 ns	17,157.38 ns	2.93	21304 B

ByteToArray	pr	3000	121,837.0 ns	3,701.15 ns	1.00	6160 B
ByteToArray	upstream	3000	357,398.5 ns	13,957.32 ns	2.94	21272 B

ghost · 2020-12-27T04:56:00Z

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

When performing OrderBy an int[] map is used to track the order of the items in the buffer. In the cases when there isn't a secondary sort however the items in the buffer can be sorted in place using ArraySortHelper which takes a Span of keys and a separate Span of items. In GetEnumerator and ToList this removes the need for the int[] map allocation and in ToArray it reuses the buffer array instead of allocating a new one.
The two open issues are:

Is there a better way to decide when to apply this optimization.
Whether classes should be allowed, for lower length it is slightly slower but has significantly lower memory allocation for ToArray
https://github.com/dotnet/performance/compare/master...johnthcall:OrderBySingleSort?expand=1

Method	Toolchain	N	Mean	Error	Ratio	Allocated
StringToList	pr	10	2,558.9 ns	135.43 ns	1.00	488 B
StringToList	upstream	10	2,524.6 ns	144.37 ns	0.99	616 B

StringToArray	pr	10	2,503.1 ns	102.37 ns	1.00	352 B
StringToArray	upstream	10	2,516.3 ns	100.05 ns	1.01	584 B

LongToList	pr	10	244.6 ns	11.17 ns	1.00	456 B
LongToList	upstream	10	381.1 ns	22.69 ns	1.56	584 B

LongToArray	pr	10	185.5 ns	8.24 ns	1.00	320 B
LongToArray	upstream	10	368.7 ns	16.02 ns	1.99	552 B

ByteToList	pr	10	208.5 ns	7.97 ns	1.00	264 B
ByteToList	upstream	10	281.9 ns	14.82 ns	1.35	392 B

ByteToArray	pr	10	162.1 ns	10.05 ns	1.00	192 B
ByteToArray	upstream	10	265.9 ns	12.39 ns	1.65	360 B

StringToList	pr	200	93,643.9 ns	3,403.92 ns	1.00	5048 B
StringToList	upstream	200	97,151.2 ns	4,114.54 ns	1.04	5936 B

StringToArray	pr	200	94,181.4 ns	4,961.81 ns	1.00	3392 B
StringToArray	upstream	200	94,923.1 ns	3,305.63 ns	1.02	5904 B

LongToList	pr	200	2,787.4 ns	121.50 ns	1.00	5016 B
LongToList	upstream	200	13,183.1 ns	494.75 ns	4.76	5904 B

LongToArray	pr	200	2,635.8 ns	90.17 ns	1.00	3360 B
LongToArray	upstream	200	13,197.0 ns	598.67 ns	5.01	5872 B

ByteToList	pr	200	2,565.9 ns	116.72 ns	1.00	816 B
ByteToList	upstream	200	12,553.4 ns	538.90 ns	4.90	1704 B

ByteToArray	pr	200	2,501.8 ns	158.85 ns	1.00	560 B
ByteToArray	upstream	200	12,389.3 ns	617.73 ns	4.97	1672 B

StringToList	pr	3000	2,445,032.7 ns	95,625.27 ns	1.00	72249 B
StringToList	upstream	3000	2,445,737.2 ns	118,153.78 ns	1.00	84337 B

StringToArray	pr	3000	2,435,115.9 ns	128,939.81 ns	1.00	48193 B
StringToArray	upstream	3000	2,482,575.2 ns	101,295.32 ns	1.03	84305 B

LongToList	pr	3000	147,456.0 ns	6,842.61 ns	1.00	72216 B
LongToList	upstream	3000	347,817.5 ns	18,424.76 ns	2.36	84304 B

LongToArray	pr	3000	142,437.5 ns	5,539.96 ns	1.00	48160 B
LongToArray	upstream	3000	358,586.4 ns	17,107.91 ns	2.53	84272 B

ByteToList	pr	3000	125,136.6 ns	6,217.83 ns	1.00	9216 B
ByteToList	upstream	3000	366,918.6 ns	17,157.38 ns	2.93	21304 B

ByteToArray	pr	3000	121,837.0 ns	3,701.15 ns	1.00	6160 B
ByteToArray	upstream	3000	357,398.5 ns	13,957.32 ns	2.94	21272 B

Author:	johnthcall
Assignees:	-
Labels:	`area-System.Linq`
Milestone:	-

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

stephentoub · 2021-01-04T16:33:58Z

Do you have numbers for the case where there is a secondary sort? e.g. a small array with a trivial primary and secondary sort? In that case we're paying for the virtual dispatch and reflection checks but not getting anything for them.

johnthcall · 2021-01-04T17:49:22Z

@stephentoub
For 10 strings containing random int's
StringToList is Strings.OrderBy(s => s).ThenBy(s => s).ToList();
StringToArray is Strings.OrderBy(s => s).ThenBy(s => s).ToArray();

Method	Toolchain	N	Mean	Error	Ratio	Allocated
StringToList	new	10	2.880 us	0.2544 us	1.00	864 B
StringToList	upstream	10	2.801 us	0.2216 us	0.99	864 B

StringToArray	new	10	2.913 us	0.2185 us	1.00	832 B
StringToArray	upstream	10	2.934 us	0.2236 us	1.01	832 B

johnthcall · 2021-03-05T19:37:17Z

@eiriktsarpalis Is anything else needed here?

eiriktsarpalis

Some additional feedback:

I'd be interested to see benchmark results when the elements are large-ish structs.
There were build failures but I couldn't see them due to the age of the build. Would it be possible to rebase your changes over the latest main?

eiriktsarpalis · 2021-03-08T20:33:50Z

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

+                return false;
+            }
+            Type elementType = typeof(TElement);
+            return elementType.IsValueType || (elementType.IsClass && !elementType.IsArray);


This expression could be optimized away by the JIT if you avoided assigning typeof(TElement) to an intermediate variable:

https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcAMACAygCzACcAHAGTACMA6AJQFcA7YASygFMBuddAbQCkWwAOLtG7IiwDGACmABPEuwD2AMxktmASi0BddHADM2RNgDCAHgAqAUQA27DswB82AN7psX7J+8lJAG5gwOzYypQAVuxSwNgA+iTEYrEAvNiM9HZ23GjePrnevl5G2JTKynbmYIwAkowACnZgUuy4ykTAMlpF7j15LKrYMglJzNgAhGkZWd0FeV4ec/N5cADs2KpgdhBcfd4AvjxLeVaKoewOTsCnSthpCkpqMraXyVo5y8XrF47JN+zUGoQABqW3o7H+2BgMCGPyu/0BEDMTQgEGwADJ0RM4X8zoiAIJEIhgeTvHqHJY9EplCpVWoNJotNodBBdHqLT7YAZDEZEZITKaZOyzTkczneNYbLY7D6fCmcvZfbAPFTqF6/bSI0F2cGQ6FDFVPdVXLSI5GQNGY5VnI32DXAU1AwnE0myrwU9D7IA=

eiriktsarpalis · 2021-03-08T20:35:21Z

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

@@ -158,6 +173,16 @@ internal override CachingComparer<TElement> GetComparer(CachingComparer<TElement
                : new CachingComparerWithChild<TElement, TKey>(_keySelector, _comparer, _descending, childComparer);
            return _parent != null ? _parent.GetComparer(cmp) : cmp;
        }
+
+        protected override bool CanInPlaceSort()


This would be a good place to write a comment detailing the motivations for this optimization.

eiriktsarpalis · 2021-03-08T20:39:58Z

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

@@ -16,6 +16,8 @@ internal abstract partial class OrderedEnumerable<TElement> : IOrderedEnumerable

        private int[] SortedMap(Buffer<TElement> buffer) => GetEnumerableSorter().Sort(buffer._items, buffer._count);

+        private void SortElements(Buffer<TElement> buffer) => GetEnumerableSorter().SortElements(buffer._items, buffer._count);


I would use a name that emphasizes that this is an in-place sort, i.e. buffer is going to get mutated.

eiriktsarpalis · 2021-03-08T20:40:52Z

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

@@ -343,6 +376,9 @@ internal override int CompareAnyKeys(int index1, int index2)
        protected override void QuickSort(int[] keys, int lo, int hi) =>
            new Span<int>(keys, lo, hi - lo + 1).Sort(CompareAnyKeys);

+        protected override void QuickSortElements(TElement[] element, int lo, int hi) =>


Similarly, I would call this something like QuickSortInPlace or something to that effect.

eiriktsarpalis · 2021-03-08T20:53:42Z

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs

@@ -343,6 +376,9 @@ internal override int CompareAnyKeys(int index1, int index2)
        protected override void QuickSort(int[] keys, int lo, int hi) =>
            new Span<int>(keys, lo, hi - lo + 1).Sort(CompareAnyKeys);

+        protected override void QuickSortElements(TElement[] element, int lo, int hi) =>
+            new Span<TKey>(_keys, lo, hi - lo + 1).Sort(new Span<TElement>(element, lo, hi-lo +1), _comparer);


This does not seem to be taking descending sorts into account.

Thanks for the catch, I tried using the overload taking Comparison<TKey> and made a function like CompareAnyKeys. This is causing a huge perf degredation, it appears that it is due to the additional method call to the Comparable<TKey> created in Sort and and the call to new new Comparison<TKey>. Since Sort doesn't support descending the only way to maintain this Perf is to restrict this to only when doing ascending which I think is a bad idea. If you don't have any further suggestion I'll abandon this PR.

Method Job Toolchain N Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated

LongToList Job-KMOQSK IComparison<TKey> 200 15,966.5 ns 736.64 ns 788.20 ns 15,986.1 ns 14,693.0 ns 17,493.0 ns 1.1421 - - 5104 B

LongToList Job-BJFYTT upstream 200 15,546.1 ns 773.05 ns 859.24 ns 15,410.0 ns 14,292.5 ns 17,462.7 ns 1.3152 - - 5904 B

LongToList Job-BCHDBG Comparable<TKey> 200 3,087.8 ns 177.38 ns 189.79 ns 3,065.8 ns 2,846.0 ns 3,512.2 ns 1.1559 - - 5016 B

stephentoub · 2021-03-16T03:16:16Z

If you don't have any further suggestion I'll abandon this PR.

Thanks for all your efforts to try to improve things here, @johnthcall. Given the data, let's go ahead and close this.

Sort values inline

2b75d8e

Dotnet-GitSync-Bot added the area-System.Linq label Dec 27, 2020

Wraith2 reviewed Dec 27, 2020

View reviewed changes

src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs Show resolved Hide resolved

eiriktsarpalis self-assigned this Feb 8, 2021

Base automatically changed from master to main March 1, 2021 09:07

eiriktsarpalis reviewed Mar 8, 2021

View reviewed changes

Wraith2 mentioned this pull request Mar 8, 2021

Rework MARSConnection state machine dotnet/SqlClient#933

Merged

stephentoub closed this Mar 16, 2021

ghost locked as resolved and limited conversation to collaborators Apr 15, 2021

karelz added this to the 6.0.0 milestone May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Linq Orderby performance optimization #46414

System.Linq Orderby performance optimization #46414

johnthcall commented Dec 27, 2020

ghost commented Dec 27, 2020

stephentoub commented Jan 4, 2021

johnthcall commented Jan 4, 2021

johnthcall commented Mar 5, 2021

eiriktsarpalis left a comment

eiriktsarpalis Mar 8, 2021 •

edited

Loading

eiriktsarpalis Mar 8, 2021

eiriktsarpalis Mar 8, 2021

eiriktsarpalis Mar 8, 2021

eiriktsarpalis Mar 8, 2021

johnthcall Mar 15, 2021

stephentoub commented Mar 16, 2021

		@@ -16,6 +16,8 @@ internal abstract partial class OrderedEnumerable<TElement> : IOrderedEnumerable

		private int[] SortedMap(Buffer<TElement> buffer) => GetEnumerableSorter().Sort(buffer._items, buffer._count);

		private void SortElements(Buffer<TElement> buffer) => GetEnumerableSorter().SortElements(buffer._items, buffer._count);

Method	Job	Toolchain	N	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
LongToList	Job-KMOQSK	IComparison<TKey>	200	15,966.5 ns	736.64 ns	788.20 ns	15,986.1 ns	14,693.0 ns	17,493.0 ns	1.1421	-	-	5104 B
LongToList	Job-BJFYTT	upstream	200	15,546.1 ns	773.05 ns	859.24 ns	15,410.0 ns	14,292.5 ns	17,462.7 ns	1.3152	-	-	5904 B
LongToList	Job-BCHDBG	Comparable<TKey>	200	3,087.8 ns	177.38 ns	189.79 ns	3,065.8 ns	2,846.0 ns	3,512.2 ns	1.1559	-	-	5016 B

System.Linq Orderby performance optimization #46414

System.Linq Orderby performance optimization #46414

Conversation

johnthcall commented Dec 27, 2020

ghost commented Dec 27, 2020

stephentoub commented Jan 4, 2021

johnthcall commented Jan 4, 2021

johnthcall commented Mar 5, 2021

eiriktsarpalis left a comment

Choose a reason for hiding this comment

eiriktsarpalis Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

eiriktsarpalis Mar 8, 2021

Choose a reason for hiding this comment

eiriktsarpalis Mar 8, 2021

Choose a reason for hiding this comment

eiriktsarpalis Mar 8, 2021

Choose a reason for hiding this comment

eiriktsarpalis Mar 8, 2021

Choose a reason for hiding this comment

johnthcall Mar 15, 2021

Choose a reason for hiding this comment

stephentoub commented Mar 16, 2021

eiriktsarpalis Mar 8, 2021 •

edited

Loading