Parsing * * * * * * … takes quadratic time #2

andersk · 2019-03-10T06:08:30Z

$ python3 -c 'print(end="* "*1000)' | time commonmark > /dev/null
0.46user 0.16system 0:00.35elapsed 177%CPU (0avgtext+0avgdata 52080maxresident)k
0inputs+0outputs (0major+15662minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*2000)' | time commonmark > /dev/null
1.45user 0.63system 0:01.07elapsed 193%CPU (0avgtext+0avgdata 52120maxresident)k
0inputs+0outputs (0major+16320minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*4000)' | time commonmark > /dev/null
6.27user 2.81system 0:04.52elapsed 201%CPU (0avgtext+0avgdata 52292maxresident)k
0inputs+0outputs (0major+18973minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*8000)' | time commonmark > /dev/null
35.88user 15.90system 0:25.93elapsed 199%CPU (0avgtext+0avgdata 51780maxresident)k
0inputs+0outputs (0major+25323minor)pagefaults 0swaps

jgm · 2019-03-11T16:38:39Z

Note that the newline is suppressed at the end of line, so this gets parsed as a very deeply nested list item. One could also test with something like

python3 -c 'print("* "*8000 + "a")'

jgm · 2019-03-11T16:40:28Z

Timings for cmark:

4000 0.19
8000 0.70
12000 2.16
16000 4.27

So cmark also shows quadratic behavior for this; the same issue could be reported there.

jgm · 2019-03-11T16:54:53Z

There doesn't seem to be any backtracking in the parsing algorithm.
It could have to do with adding things to deeply nested structures.

jgm · 2019-03-11T17:05:35Z

I suspect it has to do with the finalize step (I tried commenting out list finalizing in cmark, and it sped things up).

jgm · 2019-03-18T00:28:03Z

There are two problems.

Problem 1 is the use of getBlanks in blockFinalize for listSpec in Blocks.hs.
This recurses through all the children at each level, leading to quadratic performance.
Unfortunately we can't just port over the change to cmark, since they rely on mutable block structure.

Problem 2 is that we repeatedly scan for thematic breaks, even after it should be clear that this is unnecessary (since on the first scan, we see that the a at the end of the line is incompatible with any thematic break). In cmark we fixed this by using a custom scanner that sets a "thematic break kill position" on the first failure; we then don't need to scan for thematic breaks again until we pass this kill position. Again, the specific change will have to be modified for pure code.

jgm · 2019-03-18T00:30:18Z

We're finalizing the most deeply nested list first, then on up the line.
So it should be sufficient, for Problem 1, to store something in the list data for the finalized list indicating whether it ends in a blank line. (Indeed, this would be a cleaner solution for cmark, too.)

This is one piece of #2.

jgm · 2019-03-24T15:56:11Z

Problem 1 solved. Problem 2 remains.

See #2.

This is to store positions in a line where we know a scanner will fail. The immediate application is to #2.

This was referenced Mar 11, 2019

Parsing ‘* * * * * * … a’ takes quadratic time commonmark/cmark#284

Closed

Parsing ‘* * * * * * … a’ takes quadratic time mity/md4c#66

Closed

jgm added a commit that referenced this issue Mar 23, 2019

Improve tight/loose list detection.

39f0317

This is one piece of #2.

jgm added a commit that referenced this issue Mar 24, 2019

Add * * * * .... * a benchmark.

1837d55

See #2.

jgm added a commit that referenced this issue Mar 25, 2019

Added killPositions to BPState.

4f8c13f

This is to store positions in a line where we know a scanner will fail. The immediate application is to #2.

jgm closed this as completed in 53dd261 Mar 25, 2019

jgm added a commit that referenced this issue Mar 25, 2019

Better handling for no endline case for #2.

9aca5e1

andersk mentioned this issue Mar 27, 2019

Parsing ‘* * * * * * … a’ takes quadratic time (master, new_algo) pulldown-cmark/pulldown-cmark#249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing * * * * * * … takes quadratic time #2

Parsing * * * * * * … takes quadratic time #2

andersk commented Mar 10, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 18, 2019

jgm commented Mar 18, 2019

jgm commented Mar 24, 2019

Parsing * * * * * * … takes quadratic time #2

Parsing * * * * * * … takes quadratic time #2

Comments

andersk commented Mar 10, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

jgm commented Mar 18, 2019

jgm commented Mar 18, 2019

jgm commented Mar 24, 2019