Closed
Description
(Apparently, I cannot add labels. Would have added: reader:markdown + performance.)
I ran into a case of exponential runtime for the Markdown reader that baffles me. Sadly, I was still not able to reproduce the bug with generic data. I cannot make my pathological file publicly available but I can share it privately upon request. The type of input is
[dir1/foo_bar_1100.md](https://www.example.org/dir1/foo/bar1)
[dir2/foo_bar_2100.md](https://www.example.org/dir2/foo/bar2)
...
The exponential runtime is obvious:
$ N=0; while ((N++<10)); do n=$((10*N)); head -n $n pathological.md > tmp.md;
echo -n "$n lines: "; start=${EPOCHREALTIME//[.,]};
pandoc --to native tmp.md >/dev/null;
echo $(((${EPOCHREALTIME//[.,]}-start)/1000)) ms ; done
10 lines: 24 ms
20 lines: 41 ms
30 lines: 45 ms
40 lines: 34 ms
50 lines: 34 ms
60 lines: 50 ms
70 lines: 286 ms
80 lines: 1838 ms
90 lines: 3549 ms
100 lines: 5158 ms
At this stage (following above code) tmp.md
has 100 lines of pathological.md
. I tried to run --trace
and I notice that the output seems broken:
$ time pandoc --trace --to native tmp.md | wc -l
[trace] Parsed [Para [Link ("",[],[]) [Str "2factor_authentication.md"] ("h at line 103
703
real 0m5.182s
user 0m5.122s
sys 0m0.061s
The Commonmark reader does not have this problem:
$ time pandoc --from commonmark_x --to native tmp.md | wc -l
703
real 0m0.088s
user 0m0.082s
sys 0m0.009s
Baffling, too, is the effect of s/^/* /
:
$ sed -i 's/^/* /' tmp.md; wc -l tmp.md
100 tmp.md
$ time pandoc --to native tmp.md | wc -l
1005
real 0m0.088s
user 0m0.063s
sys 0m0.024s
Pandoc 3.1.13 on amd64 Debian GNU/Linux 11.9