-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Enumerable.ToArray performance improvement using InlineArray #90459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
use InlineArray byte[] sequence instead of LargeArrayBuilder
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsI've optimized the Benchmark
public class ToArrayBenchmark
{
[Params(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)]
public int Length { get; set; }
[Benchmark]
public List<int> ToList()
{
return GenerateNumber(Length).ToList();
}
[Benchmark]
public int[] ToArray()
{
return GenerateNumber(Length).ToArray();
}
[Benchmark]
public int[] PR_ToArray()
{
return EnumerableHelpers.ToArray2(GenerateNumber(Length));
}
public static IEnumerable<int> GenerateNumber(int count)
{
for (int i = 0; i < count; i++)
{
yield return i;
}
}
} Implementation detailIn this scenario, there's a fixed maximum length for the list of arrays that should be allocated. OptionIt's also possible to use
|
@@ -80,9 +80,112 @@ internal static T[] ToArray<T>(IEnumerable<T> source) | |||
return result; | |||
} | |||
|
|||
LargeArrayBuilder<T> builder = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. What would it take to augment your PR to the point where we could delete LargeArrayBuilder? That'd make me much more interested in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LargeArrayBuilder is used in many places, and given that it includes numerous methods specific to certain scenarios, the amount of rewrites, including from the user's side, would likely be extensive. Ideally, we'd like to replace it entirely, but we want to exclude that from the initial implementation.
Using the ArrayPool as suggested in the option might not be advisable if we replace the LargeArrayBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that these code paths are used in a bunch of places on common paths, and adding more generic types will increase code size. It'd be nice to get rid of the LargeArrayBuilder so that it's a net win for both throughput and size, rather than improving throughput at the expense of size.
There's no rush here as it's not going to make .NET 8, anyway, so there's plenty of time to make and evaluate the larger change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
If this implementation is acceptable as a basic policy,
I'll take on the challenge of replacing the LargeArrayBuilder.
(Indeed, I was hoping it might make it into .NET 8! That would be great!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this implementation is acceptable as a basic policy,
We'd support it in principle. Would you like to build on top of this PR or would you prefer to open a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am willing to work on this PR.
NET 8 release seems to be coming soon, so I will do it from the branch there when it is released. ......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main is the correct branch. Any such change will be for .NET 9.
@dotnet-policy-service agree |
Your benchmarks show this being faster than ToList (almost) across the board, could it use the same (or similar) optimization? |
|
||
public T[] ToArray(int lastBlockCount) | ||
{ | ||
T[] array = GC.AllocateUninitializedArray<T>(_count + lastBlockCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocateUninitializedArray
will introduce additional overhead for small arrays here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for AllocateUninitializedArray
already has a special case for small-ish arrays:
runtime/src/coreclr/System.Private.CoreLib/src/System/GC.CoreCLR.cs
Lines 756 to 762 in 616f758
// small arrays are allocated using `new[]` as that is generally faster. | |
#pragma warning disable 8500 // sizeof of managed types | |
if (length < 2048 / sizeof(T)) | |
#pragma warning restore 8500 | |
{ | |
return new T[length]; | |
} |
Given that
AllocateUninitializedArray
is marked as AggressiveInlining
, there's likely no discernable perf impact here (modulo tuning on the threshold).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when length(_count + lastBlockCount) == 0, return Array.Empty<byte>()
.
for (int i = 0; i < _index; i++) | ||
{ | ||
_blocks[i].CopyTo(dest); | ||
dest = dest.Slice(_blocks[i].Length); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int i = 0; i < _index; i++) | |
{ | |
_blocks[i].CopyTo(dest); | |
dest = dest.Slice(_blocks[i].Length); | |
} | |
ReadOnlySpan<T[]> blocks = _blocks; | |
foreach (T[] block in blocks.Slice(0, _index)) | |
{ | |
block.CopyTo(dest); | |
dest = dest.Slice(block.Length); | |
} |
This should remove the bounds checks here too. Not sure whether this is the best syntax possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ref var item = ref _blocks[i]
.
Something that'd be neat to add here (though the feasibility is questionable) is a special case for small arrays (say, fewer than 32 elements) using an struct ArrayBuilder<T>
{
[InlineArray(32)] struct ElementsBlock { T _field; }
[InlineArray(32)] struct ArraysBlock { T[] _field; }
ElementsBlock _elements;
ArraysBlock _arrays;
// ...
} Inside the implementation of The problem with this idea is that it'll chew through the stack for large structs. This could cause stack overflows when user-defined types come into the picture, even if the entire The solutions are not pretty. You'd have to defer the stack allocation of I'm curious to hear if there's a way to salvage this idea. |
This pull request has been automatically marked |
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
I've optimized the
Enumerable.ToArray
method for the case when the source is a pureIEnumerable<T>
.Typically, when the buffer overflows, an array with double the previous capacity is created, and the copying process repeats.
In this PR, although we still create an array of double the size, instead of copying immediately, we add the array to a list and perform all the copying at the end.
By reducing the number of copy operations, we see a significant performance improvement.
Benchmark
Implementation detail
In this scenario, there's a fixed maximum length for the list of arrays that should be allocated.
Starting with 4 and doubling the array size each time, we reach the maximum length after 29 arrays.
Therefore, using
[InlineArray(29)]
, I've reserved a fixed-length space for theT[]
.Option
It's also possible to use
ArrayPool<T>.Shared.Rent
instead ofnew T[]
.In that case, by returning the array when calling ToArray, both copying and returning can be efficiently handled.
I wasn't sure if using ArrayPool was appropriate for this scenario, so in this PR, I've opted for creating new arrays.