Add StringReaderReadLineTests #2083

nietras · 2021-10-17T11:13:54Z

Example run comparing main m to PR pr.

BenchmarkDotNet=v0.13.1.1611-nightly, OS=Windows 10.0.19043.1266 (21H1/May2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
  Job-BIEWJM : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-IFXICU : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	LineLengthRange	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Allocated
ReadLine	Job-BIEWJM	runtime-m	[ 0, 0]	7.208 ns	0.0556 ns	0.0434 ns	7.194 ns	7.165 ns	7.288 ns	1.00	0.00	-	-
ReadLine	Job-IFXICU	runtime-pr	[ 0, 0]	10.112 ns	0.1781 ns	0.1666 ns	10.105 ns	9.875 ns	10.438 ns	1.41	0.02	-	-

ReadLine	Job-BIEWJM	runtime-m	[ 1, 8]	17.751 ns	0.2287 ns	0.2028 ns	17.686 ns	17.552 ns	18.192 ns	1.00	0.00	0.0020	33 B
ReadLine	Job-IFXICU	runtime-pr	[ 1, 8]	16.998 ns	0.1011 ns	0.0946 ns	16.992 ns	16.872 ns	17.212 ns	0.96	0.01	0.0020	33 B

ReadLine	Job-BIEWJM	runtime-m	[ 9, 32]	26.336 ns	0.1205 ns	0.1127 ns	26.273 ns	26.200 ns	26.510 ns	1.00	0.00	0.0039	65 B
ReadLine	Job-IFXICU	runtime-pr	[ 9, 32]	19.896 ns	0.0740 ns	0.0618 ns	19.879 ns	19.801 ns	20.046 ns	0.76	0.00	0.0038	65 B

ReadLine	Job-BIEWJM	runtime-m	[ 33, 128]	62.594 ns	0.2100 ns	0.1861 ns	62.584 ns	62.222 ns	62.928 ns	1.00	0.00	0.0108	185 B
ReadLine	Job-IFXICU	runtime-pr	[ 33, 128]	31.021 ns	0.0902 ns	0.0800 ns	31.015 ns	30.912 ns	31.200 ns	0.50	0.00	0.0110	185 B

ReadLine	Job-BIEWJM	runtime-m	[ 129,1024]	319.244 ns	0.6595 ns	0.6169 ns	319.302 ns	318.088 ns	320.391 ns	1.00	0.00	0.0697	1,181 B
ReadLine	Job-IFXICU	runtime-pr	[ 129,1024]	98.174 ns	0.2733 ns	0.2282 ns	98.123 ns	97.705 ns	98.567 ns	0.31	0.00	0.0705	1,181 B

src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs

stephentoub · 2021-10-17T11:29:01Z

src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs

+                new (){ Min =  33, Max =  128 },
+                new (){ Min = 129, Max = 1024 },
+            };
+        }


I'd expect it'd be fairly common to find real world data that alternates between longer line lengths and empty lines, with paragraphs separated by blank lines.

I can easily add a test case for [0, 1024] but that is not exactly what you are asking, would that be better or would you rather have this refactored to support the kind of data you suggest?

I should note that the test cases exactly match real world data and use case for me, namely csv content where lines are never empty and often have same range of lengths.

do you mean something like

sb.Append(newLine); // existing if (random.Next() % 5 == 0) sb.Append(newLine);

one way to add this would be to formalize a EmptyLineRatio e.g. 0.20 as part of Range (that should then be renamed). Not sure there is much value in it overall, though. It's pretty straightforward, very short lines see regressions. Others, large gains. If you mix the two, the large gains will prevail since they dominate completely. E.g. A 1024 char line takes much longer to "find" than 1 char line. If that 1 char line only occurs 20% of the time. It's pretty irrelevant. :)

Or it other words -200ns compared to +3ns every 5 lines is not really interesting :)

Makes sense to me. @stephentoub ?

I was primarily pointing out that it's not clear to me how these ranges were arrived at, e.g. why do we think a min of 33 and a max of 128 is interesting to encode in a perf test? What is the randomness within these ranges buying us? Is this trying to simulate typical inputs in some file format (csv, xml, json, whatever) that we believe is used with StringReader frequently and such file formats typically have such line lengths?

The line lengths are pretty random, the main reason is to expose the expected regressions, which it has. We can change this to whatever ranges you would like incl fixed e.g. [20, 20]. My use case was primarily long lines ~1000 chars in csv files. It's pretty rare files have fixed line lengths in my use cases but it does happen. Almost always lines are > 80 chars.

So yes the randomness is to better simulate real world inputs. I don't pretend to be able to imagine every use case here. They are bound to be very different. E.g. xml, json exhibits ">" like patterns. csv more line length ranges like this PR adds. Markdown or txt with prose the empty line vs longer paragraphs. But all are somewhat covered by the random ranges.

src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs

adamsitnik

@nietras thank you for you contribution! I think that we can simplify the benchmark design, PTAL at my comment.

src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs

…ests.cs Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>

nietras · 2021-12-05T10:35:56Z

@adamsitnik comments on pitfalls all true, another reason I would like a MaxInvocationCount feature in BDN 😉, I very much preferred being able to compare per line benchmarks, instead of an accumulated time for parsing entire text. We could still get that by simply checking if ReadLine returns null and then create new StringReader. This would also amortize the cost, while preserving getting per line results. However, I have accepted you suggestions directly. Thanks! :)

So like below, but then initialize reader in GlobalSetup too.

        [Benchmark]
        public void ReadLine()
        {
            if (reader.ReadLine() == null)
               reader = new (_text);
        }

nietras · 2021-12-05T14:28:44Z

@adamsitnik should we use this opportunity to also add test for StreamReader using this as base? Thinking use MemoryStream then.

adamsitnik · 2021-12-06T09:47:40Z

should we use this opportunity to also add test for StreamReader using this as base? Thinking use MemoryStream then.

I like this idea! @nietras would you mind sending a separate PR?

The sooner I merge this PR the sooner we get data in our Reporting System, I am going to merge this PR now.

adamsitnik

LGTM, again thank you @nietras !

nietras added 6 commits October 16, 2021 19:22

Initial brainstorm StringReaderReadLineTests

124f711

try to cleanup a bit

e761518

refactor

dba2827

cleanup

7cb7df0

switch to target length approach

fe1dcd3

more details

113b8e8

nietras mentioned this pull request Oct 17, 2021

S.IO.StringReader: Use ReadOnlySpan.IndexOfAny in ReadLine() for performance dotnet/runtime#60463

Merged

fix tostring

05e005c

stephentoub reviewed Oct 17, 2021

View reviewed changes

src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs Outdated Show resolved Hide resolved

stephentoub reviewed Oct 17, 2021

View reviewed changes

nietras added 3 commits October 17, 2021 13:50

use Environment.NewLine and add new test cases

bf4c995

remove new line comment

bdb8551

nit capacity

b7bff33