-
Notifications
You must be signed in to change notification settings - Fork 285
Add StringReaderReadLineTests #2083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs
Outdated
Show resolved
Hide resolved
new (){ Min = 33, Max = 128 }, | ||
new (){ Min = 129, Max = 1024 }, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect it'd be fairly common to find real world data that alternates between longer line lengths and empty lines, with paragraphs separated by blank lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can easily add a test case for [0, 1024] but that is not exactly what you are asking, would that be better or would you rather have this refactored to support the kind of data you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that the test cases exactly match real world data and use case for me, namely csv
content where lines are never empty and often have same range of lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean something like
sb.Append(newLine); // existing
if (random.Next() % 5 == 0)
sb.Append(newLine);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one way to add this would be to formalize a EmptyLineRatio
e.g. 0.20 as part of Range
(that should then be renamed). Not sure there is much value in it overall, though. It's pretty straightforward, very short lines see regressions. Others, large gains. If you mix the two, the large gains will prevail since they dominate completely. E.g. A 1024 char line takes much longer to "find" than 1 char line. If that 1 char line only occurs 20% of the time. It's pretty irrelevant. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it other words -200ns compared to +3ns every 5 lines is not really interesting :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. @stephentoub ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was primarily pointing out that it's not clear to me how these ranges were arrived at, e.g. why do we think a min of 33 and a max of 128 is interesting to encode in a perf test? What is the randomness within these ranges buying us? Is this trying to simulate typical inputs in some file format (csv, xml, json, whatever) that we believe is used with StringReader frequently and such file formats typically have such line lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line lengths are pretty random, the main reason is to expose the expected regressions, which it has. We can change this to whatever ranges you would like incl fixed e.g. [20, 20]. My use case was primarily long lines ~1000 chars in csv files. It's pretty rare files have fixed line lengths in my use cases but it does happen. Almost always lines are > 80 chars.
So yes the randomness is to better simulate real world inputs. I don't pretend to be able to imagine every use case here. They are bound to be very different. E.g. xml, json exhibits ">" like patterns. csv more line length ranges like this PR adds. Markdown or txt with prose the empty line vs longer paragraphs. But all are somewhat covered by the random ranges.
src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs
Outdated
Show resolved
Hide resolved
src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nietras thank you for you contribution! I think that we can simplify the benchmark design, PTAL at my comment.
src/benchmarks/micro/libraries/System.IO/StringReaderReadLineTests.cs
Outdated
Show resolved
Hide resolved
…ests.cs Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
@adamsitnik comments on pitfalls all true, another reason I would like a So like below, but then initialize [Benchmark]
public void ReadLine()
{
if (reader.ReadLine() == null)
reader = new (_text);
} |
@adamsitnik should we use this opportunity to also add test for |
I like this idea! @nietras would you mind sending a separate PR? The sooner I merge this PR the sooner we get data in our Reporting System, I am going to merge this PR now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, again thank you @nietras !
Benchmark for dotnet/runtime#60463
cc: @danmoseley @adamsitnik
Example run comparing main
m
to PRpr
.