Skip to content

Exponential runtime for [link](url) #9710

Closed
@xrat

Description

@xrat

(Apparently, I cannot add labels. Would have added: reader:markdown + performance.)

I ran into a case of exponential runtime for the Markdown reader that baffles me. Sadly, I was still not able to reproduce the bug with generic data. I cannot make my pathological file publicly available but I can share it privately upon request. The type of input is

[dir1/foo_bar_1100.md](https://www.example.org/dir1/foo/bar1)
[dir2/foo_bar_2100.md](https://www.example.org/dir2/foo/bar2)
...

The exponential runtime is obvious:

$ N=0; while ((N++<10)); do n=$((10*N)); head -n $n pathological.md > tmp.md;
echo -n "$n lines: "; start=${EPOCHREALTIME//[.,]};
pandoc --to native tmp.md >/dev/null;
echo $(((${EPOCHREALTIME//[.,]}-start)/1000)) ms ; done
10 lines: 24 ms
20 lines: 41 ms
30 lines: 45 ms
40 lines: 34 ms
50 lines: 34 ms
60 lines: 50 ms
70 lines: 286 ms
80 lines: 1838 ms
90 lines: 3549 ms
100 lines: 5158 ms

At this stage (following above code) tmp.md has 100 lines of pathological.md. I tried to run --trace and I notice that the output seems broken:

$ time pandoc --trace --to native tmp.md | wc -l
[trace] Parsed [Para [Link ("",[],[]) [Str "2factor_authentication.md"] ("h at line 103
703

real    0m5.182s
user    0m5.122s
sys     0m0.061s

The Commonmark reader does not have this problem:

$ time pandoc --from commonmark_x --to native tmp.md | wc -l
703

real    0m0.088s
user    0m0.082s
sys     0m0.009s

Baffling, too, is the effect of s/^/* /:

$ sed -i 's/^/* /' tmp.md; wc -l tmp.md
100 tmp.md
$ time pandoc --to native tmp.md | wc -l
1005

real    0m0.088s
user    0m0.063s
sys     0m0.024s

Pandoc 3.1.13 on amd64 Debian GNU/Linux 11.9

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions