-
-
Notifications
You must be signed in to change notification settings - Fork 9
Complete rewrite of tokenization/parsing #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a massively breaking change internally and to anything that depended on the internal tokenization/validation system. It is changed in many fundamental ways. The general public interface is about the same as it was with some more minor binary-breaking changes to methods and constructors.
This makes some of the helper methods in NoRobotsRfcHelper redundant but also simplifies handling of tokens in any code that uses RobotsFileTokenReader.
We can do this because there is no official spec here. We're not typically expecting there to be new lines in the data we received though.
cf7557f
to
84a8d97
Compare
Performance improvements for Before
After
Results All with zero allocations! |
Ideally we probably should somehow combine duplicate rules but we weren't before so...
Current progress has a little bit of a regression - guessing it still is about that hand off to
|
After a bunch of back and forth, I realised rather than processing the characters directly in a span, I could process the bytes directly to avoid a bunch of overhead from going back and forth.
|
This avoids a bunch of UTF-8 conversion interop
Items left to do:
|
Hopefully this resolves the preview C# version issue
Initial benchmark for
|
More efficient processing of directives by using a mutable structure and dropping all of the LINQ processing.
|
This is a massively breaking change internally and to anything that depended on the internal tokenization/validation system. It is changed in many fundamental ways.
The general public interface is about the same as it was with some more minor binary-breaking changes to methods and constructors.
Closes #42, closes #71 and allows access to the data in #13
Before any changes
After the initial tokenization replacement
After
FromStream
actually processing it via a streamAfter
FromString
sharing the same code path asFromStream
Final Benchmark Results for
RobotsFileParser
FromStream
: 95% faster, 94% fewer allocationsFromString
: 92% faster, 94% fewer allocationsBefore any changes
Initial LINQ-heavy rewrite
Post-LINQ optimizations and performance improvements
FromRules
: 92% faster, 95% fewer allocations