Optimize header parsing #513

glebm · 2018-05-18T02:23:15Z

Fixes #505

Benchmarks:

"setext":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'

"atx":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'

Fixes gettalong#505 Benchmarks: "setext": ```bash ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}' ``` "atx": ```bash ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}' ```

krasnoukhov

Seems like this fixes a performance issue with no breaking changes. Good work!

gettalong

Thanks for the pull request and your great work!

Before I merge, please also remove the changes to .gitignore and remove the zero-byte test/testcases/block/04_header/atx_header.hcd file.

gettalong · 2018-05-23T13:29:29Z

lib/kramdown/parser/gfm.rb

-        @src.check(ATX_HEADER_MATCH)
-        level, text, id = @src[1], @src[2].to_s.strip, @src[3]
+        text, id = parse_header_contents
+        text.sub!(/[\t ]#+\z/, '') && text.rstrip!


Is there a reason why these two statements are on one line? I would rather prefer them on separate lines.

There is no need to call rstrip! if sub! return nil.

The other options here are:

A:

text.rstrip! if text.sub!(...)

B:

if text.sub!(...) text.rstrip! end

I find the current one (&&) to be the most readable but can go with A or B if that's what you prefer.

Personally I would just put them on separate lines and take this minimal performance hit because it would be easier to see what's going on without the additional conditional. But I'm okay with leaving this as it is.

gettalong · 2018-05-23T13:38:15Z

lib/kramdown/parser/kramdown/header.rb

@@ -13,47 +13,60 @@ module Kramdown
  module Parser
    class Kramdown

-      HEADER_ID=/(?:[ \t]+\{#([A-Za-z][\w:-]*)\})?/
-      SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/
+      SETEXT_HEADER_START = /^#{OPT_SPACE}(?<contents>.*)\n(?<level>[-=])[-=]*[ \t\r\f\v]*\n/


Why did you leave out [^ \t] in the new version? It is not strictly necessary for the main kramdown parser due to the fixed order of block parsers but behaviour of derived parser may change. As far as I can tell, it doesn't make a performance difference.

Also: What is the reason behind changing \s to [ \t\r\f\v]?

Why did you leave out [^ \t] in the new version?

Because it's not necessary as we have to validate header length later on anyway. I have now added it back in. No performance difference.

Also: What is the reason behind changing \s to [ \t\r\f\v]?

I find constructs like \s*?\n a bit cryptic. Performance-wise both are the same.

By the way, do we actually want to accept \r\f\v here? CommonMark spec only mentions trailing spaces (https://spec.commonmark.org/0.28/#example-54), and also allows OPT_SPACE for the underline.

gettalong · 2018-05-23T13:44:24Z

lib/kramdown/parser/kramdown/header.rb

+
+      HEADER_ID = /[\t ]{#(?<id>[A-Za-z][\w:-]*)}\z/
+
+      # @return [[String, String]] header text and optional ID.


Please use standard RDoc comments.

I'm not familiar with RDoc, and although I've read the RDoc Markup Reference, it doesn't seem to mention the correct markup for type hints, so I'm not sure what to do here.

The format I've used is YARD, which is also widely supported by IDEs for type hints. Can you convert to the desired format when merging?

RDoc doesn't have type hints.

Just leave out this API documentation (most of the other parsing methods don't have API documentation as well).

gettalong · 2018-05-23T13:44:31Z

lib/kramdown/parser/kramdown/header.rb


+      # @param [Number] level
+      # @param [String] text
+      # @param [String, nil] id


Please use standard RDoc comments.

Same here, just remove the API documentation.

gettalong · 2018-05-25T15:45:13Z

I just noticed that the PR doesn't include the test cases themselves. Could you add them or should I add them after I merge the changes?

And please squash the commits into one before I merge them - thank you!

glebm · 2018-05-26T05:06:42Z

I just noticed that the PR doesn't include the test cases themselves. Could you add them or should I add them after I merge the changes?

Please add them yourself, it will be faster this way.

And please squash the commits into one before I merge them - thank you!

Nowadays you can do it yourself at merge time by clicking "Squash and merge"!

gettalong · 2018-05-26T05:33:08Z

Merged - thanks!

krasnoukhov · 2018-05-30T10:38:50Z

@gettalong Will you cut a release please? Thank you

gettalong · 2018-05-30T17:24:40Z

Yes, if I have time this weekend.

glebm mentioned this pull request May 18, 2018

Performance issue when rendering a lot of space #505

Closed

glebm force-pushed the header-parsing branch 2 times, most recently from bc1c4f8 to 5fa05e5 Compare May 18, 2018 02:29

glebm force-pushed the header-parsing branch from 5fa05e5 to 9b35ff2 Compare May 18, 2018 12:33

krasnoukhov approved these changes May 18, 2018

View reviewed changes

gettalong requested changes May 23, 2018

View reviewed changes

gettalong assigned gettalong and glebm May 23, 2018

gettalong added the enhancement label May 23, 2018

glebm added 2 commits May 24, 2018 09:46

Address PR comments

d233f60

Remove type comments

9d117ce

gettalong approved these changes May 25, 2018

View reviewed changes

gettalong closed this May 26, 2018

glebm deleted the header-parsing branch May 31, 2018 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize header parsing #513

Optimize header parsing #513

glebm commented May 18, 2018

krasnoukhov left a comment

gettalong left a comment

gettalong May 23, 2018

glebm May 24, 2018 •

edited

Loading

gettalong May 24, 2018 •

edited

Loading

gettalong May 23, 2018

glebm May 24, 2018 •

edited

Loading

gettalong May 23, 2018

glebm May 24, 2018

gettalong May 24, 2018

glebm May 25, 2018

gettalong May 23, 2018

gettalong May 24, 2018

glebm May 25, 2018

gettalong commented May 25, 2018

glebm commented May 26, 2018

gettalong commented May 26, 2018

krasnoukhov commented May 30, 2018

gettalong commented May 30, 2018


		HEADER_ID = /[\t ]{#(?<id>[A-Za-z][\w:-]*)}\z/

		# @return [[String, String]] header text and optional ID.

Optimize header parsing #513

Optimize header parsing #513

Conversation

glebm commented May 18, 2018

krasnoukhov left a comment

Choose a reason for hiding this comment

gettalong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glebm May 24, 2018 • edited Loading

Choose a reason for hiding this comment

gettalong May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glebm May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gettalong commented May 25, 2018

glebm commented May 26, 2018

gettalong commented May 26, 2018

krasnoukhov commented May 30, 2018

gettalong commented May 30, 2018

glebm May 24, 2018 •

edited

Loading

gettalong May 24, 2018 •

edited

Loading

glebm May 24, 2018 •

edited

Loading