Array.FindAll improvements by Henr1k80 · Pull Request #120336 · dotnet/runtime

Henr1k80 · 2025-10-02T20:57:48Z

I found that there were room for improvement in Array.FindAll

Now a small amount of matches are stack allocated and the List is only optionally allocated.

It is always faster with less allocations.

The stack allocated size can be made bigger, but it is fixed size, so it will have a price for no-match schenarios.

InlineArray16<T> is only in .NET 10, but if this should be backported to .NET 8, we can create our local InlineMatchArrayX

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.6899/25H2/2025Update/HudsonValley2)
Intel Core i7-10750H CPU 2.60GHz (Max: 2.59GHz), 1 CPU, 12 logical and 6 physical cores
.NET SDK 10.0.102
  [Host]             : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v3
  ShortRun-.NET 10.0 : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v3

Method	Size	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
NewFindAllMatchAlways	1	8.612 ns	0.1054 ns	0.0986 ns	8.649 ns	0.44	0.01	0.0051	32 B	0.31
OldFindAllMatchAlways	1	19.612 ns	0.1859 ns	0.1553 ns	19.612 ns	1.00	0.01	0.0166	104 B	1.00

NewFindAllMatchAlways	4	11.605 ns	0.1418 ns	0.1257 ns	11.583 ns	0.47	0.01	0.0064	40 B	0.36
OldFindAllMatchAlways	4	24.465 ns	0.1829 ns	0.1711 ns	24.481 ns	1.00	0.01	0.0179	112 B	1.00

NewFindAllMatchAlways	5	12.565 ns	0.1595 ns	0.1332 ns	12.556 ns	0.32	0.00	0.0076	48 B	0.27
OldFindAllMatchAlways	5	38.987 ns	0.2646 ns	0.2346 ns	39.031 ns	1.00	0.01	0.0280	176 B	1.00

NewFindAllMatchAlways	8	16.465 ns	0.5138 ns	1.5068 ns	15.716 ns	0.36	0.03	0.0089	56 B	0.30
OldFindAllMatchAlways	8	46.208 ns	0.6164 ns	0.5147 ns	46.041 ns	1.00	0.02	0.0293	184 B	1.00

NewFindAllMatchAlways	9	23.627 ns	0.3274 ns	0.2902 ns	23.615 ns	0.38	0.01	0.0191	120 B	0.43
OldFindAllMatchAlways	9	62.204 ns	1.2656 ns	1.1838 ns	61.899 ns	1.00	0.03	0.0446	280 B	1.00

NewFindAllMatchHalf	1	9.687 ns	0.2352 ns	0.2310 ns	9.703 ns	0.46	0.01	0.0051	32 B	0.31
OldFindAllMatchHalf	1	20.935 ns	0.4699 ns	0.4615 ns	20.813 ns	1.00	0.03	0.0166	104 B	1.00

NewFindAllMatchHalf	4	13.175 ns	0.3239 ns	0.3977 ns	13.039 ns	0.55	0.02	0.0051	32 B	0.31
OldFindAllMatchHalf	4	23.811 ns	0.5150 ns	0.5058 ns	23.746 ns	1.00	0.03	0.0166	104 B	1.00

NewFindAllMatchHalf	5	12.973 ns	0.3205 ns	0.3291 ns	12.894 ns	0.50	0.02	0.0063	40 B	0.36
OldFindAllMatchHalf	5	26.193 ns	0.5801 ns	0.5426 ns	26.252 ns	1.00	0.03	0.0179	112 B	1.00

NewFindAllMatchHalf	8	14.521 ns	0.3448 ns	0.3832 ns	14.484 ns	0.51	0.02	0.0063	40 B	0.36
OldFindAllMatchHalf	8	28.599 ns	0.6119 ns	0.7047 ns	28.575 ns	1.00	0.03	0.0178	112 B	1.00

NewFindAllMatchHalf	9	23.859 ns	0.2977 ns	0.3655 ns	23.859 ns	0.54	0.01	0.0166	104 B	0.59
OldFindAllMatchHalf	9	44.463 ns	0.8785 ns	0.7788 ns	44.551 ns	1.00	0.02	0.0280	176 B	1.00

NewFindAllMatchNever	1	1.506 ns	0.0304 ns	0.0269 ns	1.502 ns	0.40	0.01	-	-	0.00
OldFindAllMatchNever	1	3.785 ns	0.0430 ns	0.0381 ns	3.786 ns	1.00	0.01	0.0051	32 B	1.00

NewFindAllMatchNever	4	2.249 ns	0.0378 ns	0.0354 ns	2.251 ns	0.48	0.01	-	-	0.00
OldFindAllMatchNever	4	4.736 ns	0.0892 ns	0.0834 ns	4.730 ns	1.00	0.02	0.0051	32 B	1.00

NewFindAllMatchNever	5	2.738 ns	0.0298 ns	0.0264 ns	2.733 ns	0.49	0.01	-	-	0.00
OldFindAllMatchNever	5	5.622 ns	0.0644 ns	0.0603 ns	5.612 ns	1.00	0.01	0.0051	32 B	1.00

NewFindAllMatchNever	8	3.424 ns	0.0244 ns	0.0216 ns	3.426 ns	0.66	0.01	-	-	0.00
OldFindAllMatchNever	8	5.212 ns	0.0459 ns	0.0429 ns	5.205 ns	1.00	0.01	0.0051	32 B	1.00

NewFindAllMatchNever	9	3.798 ns	0.0202 ns	0.0169 ns	3.796 ns	0.67	0.02	-	-	0.00
OldFindAllMatchNever	9	5.714 ns	0.1782 ns	0.1667 ns	5.688 ns	1.00	0.04	0.0051	32 B	1.00

src/libraries/System.Private.CoreLib/src/System/Array.cs

…n just return the already allocated array

src/libraries/System.Private.CoreLib/src/System/Array.cs

huoyaoyuan · 2025-10-03T07:07:06Z

Thanks you for your contribution. The code is not performing well with best practices. Instead, just replace the List with ValueListBuilder and add a Dispose call to the builder will serve you all the optimizations.

huoyaoyuan · 2025-10-03T08:56:50Z

There may be more unintentional affect after rethinking. ArrayPool<T> is quite heavy for each T, especially when the pool is not widely shared. That's also why the BCL only uses ArrayPool over limited set of types (byte, char and other integers). Even ArrayPool<object> is never used in BCL.

Initializing the array pool for each T has not acceptable overhead to whole program. Microbenchmark improvement doesn't result in benefit to whole program. This PR could not be accepted because of this.

Henr1k80 · 2025-10-03T09:23:13Z

There may be more unintentional affect after rethinking. ArrayPool<T> is quite heavy for each T, especially when the pool is not widely shared. That's also why the BCL only uses ArrayPool over limited set of types (byte, char and other integers). Even ArrayPool<object> is never used in BCL.

Initializing the array pool for each T has not acceptable overhead to whole program. Microbenchmark improvement doesn't result in benefit to whole program. This PR could not be accepted because of this.

SegmentedArrayBuilder uses ArrayPool<T> and SegmentedArrayBuilder is used within LINQ

…ating a list

src/libraries/System.Private.CoreLib/src/System/Array.cs

jkotas · 2026-01-21T22:42:45Z

I wanted to use this method, dived into the implementation and found that I was better off not using this method, unless improved

What would be the typical distribution of the array lengths passed into this API and returned by this API in your case?

could you please guide how can do that to ensure no regression?

This type of change tends to be shifting the cost. It improves the cases that we expect to matter more, and degrades the cases that we expect to be unimportant.

My gut-feel is that:

16 is probably too big of a fixed buffer. Something like 4 may be more appropriate.
Depending on List pulls in a bunch of extra code. It may be leaner to open code array resizing as part of the method (without any helpers). It would save both List<T> allocation and List code.

Henr1k80 · 2026-01-22T11:12:21Z

@jkotas

16 is probably too big of a fixed buffer. Something like 4 may be more appropriate.

Does it make sense to check the length of the incoming array first?
If it is less than or equal to 16, we could call an implementation that only stack allocates, without using any List code at all.
There could also be implementations for less than or equal to 4 and/or 8, at the cost of more code & branches.

I presume that the dotnet runtime will move the unused code paths out of the way, when it is optimizing.
I do not know enough about NativeAOT to tell if it would do the same?

Henr1k80 · 2026-01-22T11:19:44Z

@jkotas

What would be the typical distribution of the array lengths passed into this API and returned by this API in your case?

I can no longer remember the context, but likely less than 4 and almost certainly less than 16 inputted.
As this is reasonable to stack allocate, the output size doesn't matter that much.

tarekgh · 2026-01-22T15:40:37Z

@jkotas would the following implementation match what you are suggesting?

        public static T[] FindAll<T>(T[] array, Predicate<T> match)
        {
            if (array == null)
            {
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
            }

            if (match == null)
            {
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
            }

            InlineArray4<T> stackAllocatedMatches = default;
            Span<T> span = stackAllocatedMatches;
            int foundCount = 0;

            for (int i = 0; i < array.Length; i++)
            {
                T value = array[i];
                if (match(value))
                {
                    if (foundCount >= span.Length)
                    {
                        T[] values = new T[span.Length * 2];
                        span.CopyTo(values);
                        span = values;
                    }

                    span[foundCount++] = value;
                }
            }

            return span.Slice(0, foundCount).ToArray();
        }

jkotas · 2026-01-22T16:51:04Z

Yes, I think something like this would have a better balance of perf characteristics. You do not need to special case Array,Empty in the implementation. Span.ToArray has that special case already.

tarekgh · 2026-01-22T17:08:33Z

@Henr1k80 could you please try the implementation #120336 (comment) and get the benchmark numbers for it? Thanks!

jkotas · 2026-01-22T17:35:49Z

If it is less than or equal to 16, we could call an implementation that only stack allocates, without using any List code at all.
There could also be implementations for less than or equal to 4 and/or 8, at the cost of more code & branches.

This is a convenience API that is always going to be leaving perf on the table. I do not think it makes sense to overengineer the implementation like this.

Henr1k80 · 2026-01-23T11:30:48Z

@tarekgh & @jkotas, I have updated the code & updated the benchmarks

tarekgh

@jkotas the latest update LGTM. Please let us know if you have any more feedback.

src/libraries/System.Private.CoreLib/src/System/Array.cs

jkotas

LGTM otherwise. Thanks!

src/libraries/System.Private.CoreLib/src/System/Array.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Henr1k80 · 2026-01-24T23:53:28Z

I just updated the bench results in the original comment to the newest suggestions.
I wish there was an easy way to compare with Toubs last change, it could avoid some overallocations, especially compared to the original implementation.

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 2, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2025

lechu445 reviewed Oct 2, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Outdated Show resolved Hide resolved

lechu445 reviewed Oct 2, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Outdated Show resolved Hide resolved

Array.FindAll improvements

eb7ecd7

Henr1k80 force-pushed the main branch from 1d407a8 to acb68c0 Compare October 2, 2025 22:35

If all allocated are matched AND the match-array is not rented, we ca…

5cb2e20

…n just return the already allocated array

Henr1k80 force-pushed the main branch from acb68c0 to 5cb2e20 Compare October 2, 2025 22:42

This was referenced Oct 3, 2025

STATUS_UNSUCCESSFUL in RsaCryptRoundtrip_OaepSHA1 #29683

Open

[browser] HalfTests.ExplicitConversion_FromSingle failing due to NaN != NaN #103347

Open

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

vcsjones mentioned this pull request Oct 3, 2025

RSA 384 failing in Windows #120353

Closed

huoyaoyuan reviewed Oct 3, 2025

View reviewed changes

huoyaoyuan added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 3, 2025

Style changes, makes the method not fit within screen height.

c7e3678

Henr1k80 marked this pull request as draft October 3, 2025 10:23

This was referenced Oct 3, 2025

ProcessThreadTests.TestStartTimeProperty failure in CI #105526

Open

AppHost tests fail with "Failure extracting contents of the application bundle." #119249

Closed

Use a small amount of stack allocated matches before optionally alloc…

a0694b2

…ating a list

Henr1k80 marked this pull request as ready for review October 3, 2025 14:14

Style changes

f6269d6

lechu445 reviewed Oct 4, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Outdated Show resolved Hide resolved

Henr1k80 mentioned this pull request Oct 6, 2025

Make ArrayBuilder stack allocate the first 16 entries for less heap allocations #120439

Closed

jeffhandley assigned tarekgh Jan 14, 2026

tarekgh added this to the 11.0.0 milestone Jan 14, 2026

tarekgh reviewed Jan 14, 2026

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Show resolved Hide resolved

Henr1k80 added 2 commits January 22, 2026 09:07

Merge remote-tracking branch 'dotnet/main'

8076db4

Use Span.Copy to copy the inline array to the resulting array

15e10bd

dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 22, 2026

This was referenced Jan 22, 2026

[wasm] OpenQA.Selenium.WebDriverTimeoutException: timeout: Timed out receiving message from renderer #117486

Open

[Test][UserEvents] Trace file does not contain expected events #123442

Closed

Avoid using List as buffer & reduce stack allocated buffer size

cdf7f3a

This was referenced Jan 23, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

Unable to pull image from mcr.microsoft.com #117164

Open

tarekgh approved these changes Jan 23, 2026

View reviewed changes

jkotas reviewed Jan 23, 2026

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Outdated Show resolved Hide resolved

jkotas approved these changes Jan 23, 2026

View reviewed changes

stephentoub reviewed Jan 23, 2026

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Array.cs Outdated Show resolved Hide resolved

tarekgh and others added 2 commits January 23, 2026 13:50

Update src/libraries/System.Private.CoreLib/src/System/Array.cs

d9498d0

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Apply feedback suggestion

e78bcc6

stephentoub approved these changes Jan 23, 2026

View reviewed changes

jkotas mentioned this pull request Jan 24, 2026

[Wasm] The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing #123572

Open

jkotas merged commit 687c8d9 into dotnet:main Jan 24, 2026
148 of 150 checks passed

jkotas added the tenet-performance Performance related issue label Jan 24, 2026

Henr1k80 mentioned this pull request Jan 24, 2026

Use Array.FindAll in ComEventMethods #123582

Merged

dotnet-maestro bot mentioned this pull request Jan 25, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#4407

Merged

github-actions bot locked and limited conversation to collaborators Feb 24, 2026

Conversation

Henr1k80 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huoyaoyuan commented Oct 3, 2025

Uh oh!

huoyaoyuan commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Henr1k80 commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

jkotas commented Jan 21, 2026

Uh oh!

Henr1k80 commented Jan 22, 2026

Uh oh!

Henr1k80 commented Jan 22, 2026

Uh oh!

tarekgh commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Jan 22, 2026

Uh oh!

tarekgh commented Jan 22, 2026

Uh oh!

jkotas commented Jan 22, 2026

Uh oh!

Henr1k80 commented Jan 23, 2026

Uh oh!

tarekgh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Henr1k80 commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Henr1k80 commented Oct 2, 2025 •

edited

Loading

huoyaoyuan commented Oct 3, 2025 •

edited

Loading

tarekgh commented Jan 22, 2026 •

edited

Loading

tarekgh left a comment •

edited

Loading