Another method + improvements #1
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Well, now you got me curious too 😄
I refactored everything to only benchmark the actual differences, so IO is out of the benchmark now. This lets us do more iterations with more accurate measurement since the IO turns out to be expensive and noisy.
I did a few simplifications of the existing methods (mostly just using for-each loops so JVM can do more magic). I figured out how to both optimize the Regex (correct and ~3x as fast, beating the old manual loop) and sped up the manual loop as well.
I also added another lookup-table-based solution which beats out all the others by a few ms, but probably is too fancy to try and implement in the original context :)
Results:
Answering my original question (ref): it does look like the additional
containschecks add a ~45% cost. Only checking the non-whitespace we get:Of course, this cost should be negligible in context.
It actually surprises me that the manual loop still isn't winning. I found a production-grade C++ CSV serializer and saw that they were doing that approach.
JVM things, I guess. 🤷