-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize header parsing #513
Conversation
bc1c4f8
to
5fa05e5
Compare
Fixes gettalong#505 Benchmarks: "setext": ```bash ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}' ``` "atx": ```bash ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}' ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this fixes a performance issue with no breaking changes. Good work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request and your great work!
Before I merge, please also remove the changes to .gitignore
and remove the zero-byte test/testcases/block/04_header/atx_header.hcd
file.
@src.check(ATX_HEADER_MATCH) | ||
level, text, id = @src[1], @src[2].to_s.strip, @src[3] | ||
text, id = parse_header_contents | ||
text.sub!(/[\t ]#+\z/, '') && text.rstrip! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why these two statements are on one line? I would rather prefer them on separate lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to call rstrip!
if sub!
return nil.
The other options here are:
A:
text.rstrip! if text.sub!(...)
B:
if text.sub!(...)
text.rstrip!
end
I find the current one (&&
) to be the most readable but can go with A
or B
if that's what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would just put them on separate lines and take this minimal performance hit because it would be easier to see what's going on without the additional conditional. But I'm okay with leaving this as it is.
@@ -13,47 +13,60 @@ module Kramdown | |||
module Parser | |||
class Kramdown | |||
|
|||
HEADER_ID=/(?:[ \t]+\{#([A-Za-z][\w:-]*)\})?/ | |||
SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/ | |||
SETEXT_HEADER_START = /^#{OPT_SPACE}(?<contents>.*)\n(?<level>[-=])[-=]*[ \t\r\f\v]*\n/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you leave out [^ \t]
in the new version? It is not strictly necessary for the main kramdown parser due to the fixed order of block parsers but behaviour of derived parser may change. As far as I can tell, it doesn't make a performance difference.
Also: What is the reason behind changing \s
to [ \t\r\f\v]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you leave out [^ \t] in the new version?
Because it's not necessary as we have to validate header length later on anyway. I have now added it back in. No performance difference.
Also: What is the reason behind changing \s to [ \t\r\f\v]?
I find constructs like \s*?\n
a bit cryptic. Performance-wise both are the same.
By the way, do we actually want to accept \r\f\v
here? CommonMark spec only mentions trailing spaces (https://spec.commonmark.org/0.28/#example-54), and also allows OPT_SPACE
for the underline.
|
||
HEADER_ID = /[\t ]{#(?<id>[A-Za-z][\w:-]*)}\z/ | ||
|
||
# @return [[String, String]] header text and optional ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use standard RDoc comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with RDoc, and although I've read the RDoc Markup Reference, it doesn't seem to mention the correct markup for type hints, so I'm not sure what to do here.
The format I've used is YARD, which is also widely supported by IDEs for type hints. Can you convert to the desired format when merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RDoc doesn't have type hints.
Just leave out this API documentation (most of the other parsing methods don't have API documentation as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
# @param [Number] level | ||
# @param [String] text | ||
# @param [String, nil] id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use standard RDoc comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, just remove the API documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I just noticed that the PR doesn't include the test cases themselves. Could you add them or should I add them after I merge the changes? And please squash the commits into one before I merge them - thank you! |
Please add them yourself, it will be faster this way.
Nowadays you can do it yourself at merge time by clicking "Squash and merge"! |
Merged - thanks! |
@gettalong Will you cut a release please? Thank you |
Yes, if I have time this weekend. |
Fixes #505
Benchmarks:
"setext":
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
"atx":
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'