Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

String.Split optimization #637

Merged
merged 1 commit into from
Jul 7, 2015
Merged

Conversation

justinvp
Copy link

@justinvp justinvp commented Apr 3, 2015

  • Use EmptyArray<String>.Value instead of new String[0] on .NET Core
  • Return early before allocating internal array(s) when count == 1

@weshaggard
Copy link
Member

@justinvp can you provide any data about how much this improves things? According to our contribution bar https://github.com/dotnet/coreclr/wiki/Contributing#contribution-bar we are only taking changes that have a demonstrably broad impact of a mainline scenario and are low risk.

@justinvp
Copy link
Author

justinvp commented Apr 4, 2015

can you provide any data about how much this improves things?

I don't have any data for this. I was investigating the possibility of optimizing String.Split to minimize/avoid some of the internal allocations it does, when I saw some simple cleanup opportunities, namely, returning Array.Empty<string>() instead of returning new String[0] and checking for the count == 1 edge case and returning before unnecessarily allocating int[] sepList = new int[Length]; and int[] lengthList = new int[Length];, instead of doing that check and returning after those allocations.

@ghost
Copy link

ghost commented May 22, 2015

Although minor improvement, it still counts IMO.
@justinvp, could you come up with a benchmark and show the before 'n after stats?

@justinvp justinvp force-pushed the stringsplit branch 2 times, most recently from 605b8eb to 508fbad Compare May 24, 2015 11:50
@justinvp
Copy link
Author

I updated the PR, removing the formatting & trailing whitespace cleanup, to make the changes more targeted. The diff is simpler and easier to review now.

I also ran a microbenchmark for a contrived case where the count is 1 (10,000,000 iterations). The result was ~12x speed improvement, in addition to the allocation reduction.

A is before and B is after (elapsed milliseconds):

A: 00:00:01.6835160
B: 00:00:00.1366061
A/B     : 12.3238713351746
B/A     : 0.0811433333570931
(A-B)/A : 0.918856666642907
(B-A)/B : 11.3238713351746


A: 00:00:01.7130745
B: 00:00:00.1377855
A/B     : 12.4329083974729
B/A     : 0.0804317033497376
(A-B)/A : 0.919568296650263
(B-A)/B : 11.4329083974729


A: 00:00:01.6833079
B: 00:00:00.1387093
A/B     : 12.1355085780117
B/A     : 0.0824028093731397
(A-B)/A : 0.91759719062686
(B-A)/B : 11.1355085780117


A: 00:00:01.6823955
B: 00:00:00.1397315
A/B     : 12.0402021018883
B/A     : 0.0830550842533756
(A-B)/A : 0.916944915746624
(B-A)/B : 11.0402021018883


A: 00:00:01.6638048
B: 00:00:00.1403704
A/B     : 11.852960453201
B/A     : 0.0843671084492604
(A-B)/A : 0.91563289155074
(B-A)/B : 10.852960453201


A: 00:00:01.6723702
B: 00:00:00.1425899
A/B     : 11.7285319647465
B/A     : 0.0852621626479592
(A-B)/A : 0.914737837352041
(B-A)/B : 10.7285319647465


A: 00:00:01.6890087
B: 00:00:00.1366335
A/B     : 12.3616001932176
B/A     : 0.0808956756705871
(A-B)/A : 0.919104324329413
(B-A)/B : 11.3616001932176

@ghost
Copy link

ghost commented Jun 6, 2015

@stephentoub, @ericstj, @janvorli, thoughts on this?

@whoisj
Copy link

whoisj commented Jun 6, 2015

Most impressive. 👍

@hadibrais
Copy link

@justinvp The benchmark you provided is not sufficient. The value of count is rarely 0 or 1. The more interesting benchmark would show the difference in perf when numReplaces > 0 and count > 1. We have to show that the perf of this path of execution is not degraded and we have to show that on different microarchitectures from Intel and AMD. Another benchmark with random input is also required.

@ghost
Copy link

ghost commented Jun 10, 2015

We have to show that the perf of this path of execution is not degraded
...
Another benchmark with random input is also required.

Agree.

show that on different microarchitectures from Intel and AMD

Not possible, see https://github.com/dotnet/coreclr/issues/1078#issuecomment-107718404 and https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/developer-guide.md#building-the-repository. Interesting thoughts though. 😄

@hadibrais
Copy link

@jasonwilliams200OK Thanks for the references, but they are not related. x86 and x64 are ISAs, not microarchitectures which are totally different.

@ghost
Copy link

ghost commented Jun 11, 2015

@hadibrais, thanks for the correction. Now I understand that it was subjected to organization.
Still my concern is, if someone contributes a PR with benchmark test, wouldn't it be nice if CI build presents those options to test against varied architecture on different operating systems? Lets say, if the label optimization or performance is assigned to PR, @dotnet-bot reruns the tests with different parameters (permute the set of hardware in the build matrix). These days, many devs use light PC devices (BYOD trend) and leverage CI-builds / Virtual machines to test, analyze and verify their code. It would be a hassle to arrange different hardware for benchmark purpose (unless one have access to many devices at home and/or work).

@hadibrais
Copy link

@jasonwilliams200OK Yes that would be nice.

@@ -973,17 +973,20 @@ internal String[] SplitInternal(char[] separator, int count, StringSplitOptions

if ((count == 0) || (omitEmptyEntries && this.Length == 0))
{
return new String[0];
return EmptyArray<String>.Value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I think this change to use a cached empty array is good. In practice I do worry about compat issues that could arise. If it were just for .NET Core, I'd be less concerned, and we've made many similar changes in corefx, but as this mscorlib is used elsewhere, I wonder if it's worth it, especially considering what I'd expect to be the relative rarity of hitting these cases (I could be wrong). @jkotas, any thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's about using a static readonly string[] which is conditionally compiled to either new string[0];, and EmptyArray<string>.Value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the change to return EmptyArray.Value under #if FEATURE_CORECLR as @mburbea suggested. The change has likely marginal benefit - it is not worth discussing the compat risk on whether to take it everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Updated to use #if FEATURE_CORECLR.

- Use EmptyArray<String>.Value instead of new String[0] on .NET Core
- Return early before allocating internal array(s) when count == 1
@justinvp
Copy link
Author

justinvp commented Jul 1, 2015

Thanks for the feedback. PR updated. I also verified the change doesn't regress perf when numReplaces > 0 and count > 1.

@stephentoub
Copy link
Member

LGTM. Have you tried running the corefx System.Runtime tests using this? Doesn't like there should be any issues, but it'd be good just to verify nothing pops.

@justinvp
Copy link
Author

justinvp commented Jul 2, 2015

Yep, all corefx System.Runtime tests pass with these changes running locally on my machine.

stephentoub added a commit that referenced this pull request Jul 7, 2015
@stephentoub stephentoub merged commit 25b77bd into dotnet:master Jul 7, 2015
@justinvp justinvp deleted the stringsplit branch September 16, 2015 22:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants