Skip to content

Conversation

nietras
Copy link
Contributor

@nietras nietras commented Oct 17, 2021

Benchmark for dotnet/runtime#60463

cc: @danmoseley @adamsitnik

Example run comparing main m to PR pr.

BenchmarkDotNet=v0.13.1.1611-nightly, OS=Windows 10.0.19043.1266 (21H1/May2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
  Job-BIEWJM : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-IFXICU : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Method Job Toolchain LineLengthRange Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated
ReadLine Job-BIEWJM runtime-m [ 0, 0] 7.208 ns 0.0556 ns 0.0434 ns 7.194 ns 7.165 ns 7.288 ns 1.00 0.00 - -
ReadLine Job-IFXICU runtime-pr [ 0, 0] 10.112 ns 0.1781 ns 0.1666 ns 10.105 ns 9.875 ns 10.438 ns 1.41 0.02 - -
ReadLine Job-BIEWJM runtime-m [ 1, 8] 17.751 ns 0.2287 ns 0.2028 ns 17.686 ns 17.552 ns 18.192 ns 1.00 0.00 0.0020 33 B
ReadLine Job-IFXICU runtime-pr [ 1, 8] 16.998 ns 0.1011 ns 0.0946 ns 16.992 ns 16.872 ns 17.212 ns 0.96 0.01 0.0020 33 B
ReadLine Job-BIEWJM runtime-m [ 9, 32] 26.336 ns 0.1205 ns 0.1127 ns 26.273 ns 26.200 ns 26.510 ns 1.00 0.00 0.0039 65 B
ReadLine Job-IFXICU runtime-pr [ 9, 32] 19.896 ns 0.0740 ns 0.0618 ns 19.879 ns 19.801 ns 20.046 ns 0.76 0.00 0.0038 65 B
ReadLine Job-BIEWJM runtime-m [ 33, 128] 62.594 ns 0.2100 ns 0.1861 ns 62.584 ns 62.222 ns 62.928 ns 1.00 0.00 0.0108 185 B
ReadLine Job-IFXICU runtime-pr [ 33, 128] 31.021 ns 0.0902 ns 0.0800 ns 31.015 ns 30.912 ns 31.200 ns 0.50 0.00 0.0110 185 B
ReadLine Job-BIEWJM runtime-m [ 129,1024] 319.244 ns 0.6595 ns 0.6169 ns 319.302 ns 318.088 ns 320.391 ns 1.00 0.00 0.0697 1,181 B
ReadLine Job-IFXICU runtime-pr [ 129,1024] 98.174 ns 0.2733 ns 0.2282 ns 98.123 ns 97.705 ns 98.567 ns 0.31 0.00 0.0705 1,181 B

new (){ Min = 33, Max = 128 },
new (){ Min = 129, Max = 1024 },
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect it'd be fairly common to find real world data that alternates between longer line lengths and empty lines, with paragraphs separated by blank lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can easily add a test case for [0, 1024] but that is not exactly what you are asking, would that be better or would you rather have this refactored to support the kind of data you suggest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that the test cases exactly match real world data and use case for me, namely csv content where lines are never empty and often have same range of lengths.

Copy link
Member

@danmoseley danmoseley Oct 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean something like

           sb.Append(newLine); // existing
           if (random.Next() % 5 == 0)
               sb.Append(newLine);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way to add this would be to formalize a EmptyLineRatio e.g. 0.20 as part of Range (that should then be renamed). Not sure there is much value in it overall, though. It's pretty straightforward, very short lines see regressions. Others, large gains. If you mix the two, the large gains will prevail since they dominate completely. E.g. A 1024 char line takes much longer to "find" than 1 char line. If that 1 char line only occurs 20% of the time. It's pretty irrelevant. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it other words -200ns compared to +3ns every 5 lines is not really interesting :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. @stephentoub ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was primarily pointing out that it's not clear to me how these ranges were arrived at, e.g. why do we think a min of 33 and a max of 128 is interesting to encode in a perf test? What is the randomness within these ranges buying us? Is this trying to simulate typical inputs in some file format (csv, xml, json, whatever) that we believe is used with StringReader frequently and such file formats typically have such line lengths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line lengths are pretty random, the main reason is to expose the expected regressions, which it has. We can change this to whatever ranges you would like incl fixed e.g. [20, 20]. My use case was primarily long lines ~1000 chars in csv files. It's pretty rare files have fixed line lengths in my use cases but it does happen. Almost always lines are > 80 chars.

So yes the randomness is to better simulate real world inputs. I don't pretend to be able to imagine every use case here. They are bound to be very different. E.g. xml, json exhibits ">" like patterns. csv more line length ranges like this PR adds. Markdown or txt with prose the empty line vs longer paragraphs. But all are somewhat covered by the random ranges.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nietras thank you for you contribution! I think that we can simplify the benchmark design, PTAL at my comment.

…ests.cs

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
@nietras
Copy link
Contributor Author

nietras commented Dec 5, 2021

@adamsitnik comments on pitfalls all true, another reason I would like a MaxInvocationCount feature in BDN 😉, I very much preferred being able to compare per line benchmarks, instead of an accumulated time for parsing entire text. We could still get that by simply checking if ReadLine returns null and then create new StringReader. This would also amortize the cost, while preserving getting per line results. However, I have accepted you suggestions directly. Thanks! :)

So like below, but then initialize reader in GlobalSetup too.

        [Benchmark]
        public void ReadLine()
        {
            if (reader.ReadLine() == null)
               reader = new (_text);
        }

@nietras
Copy link
Contributor Author

nietras commented Dec 5, 2021

@adamsitnik should we use this opportunity to also add test for StreamReader using this as base? Thinking use MemoryStream then.

@adamsitnik
Copy link
Member

should we use this opportunity to also add test for StreamReader using this as base? Thinking use MemoryStream then.

I like this idea! @nietras would you mind sending a separate PR?

The sooner I merge this PR the sooner we get data in our Reporting System, I am going to merge this PR now.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, again thank you @nietras !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants