Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistletoe hangs when parsing some specifically formatted Footnotes #124

Closed
ddevault opened this issue Nov 10, 2021 · 6 comments
Closed

Mistletoe hangs when parsing some specifically formatted Footnotes #124

ddevault opened this issue Nov 10, 2021 · 6 comments
Assignees
Labels
bug has-workaround A bug that has a workaround.

Comments

@ddevault
Copy link

>>> import mistletoe
>>> input = "foo bar [1]:\r\nfoo bar\r\n\r\n[1]: https://example.org/\r\nhttps://example.org"
>>> mistletoe.markdown(input)

This never returns, or at least does not return within the limits of my patience.

@pbodnar
Copy link
Collaborator

pbodnar commented Nov 10, 2021

Hi, it looks like this is caused by mistletoe not quite expecting CRLF line-endings in the input - see #64. From my quick testing, it freezes because of the last \r\n. The stacktrace is like this (after pressing ctrl+c):

$ python issue-124.py
Traceback (most recent call last):
  File "issue-124.py", line 3, in <module>
    print(mistletoe.markdown(input))
  File "d:\projects\my-forks\mistletoe\mistletoe\__init__.py", line 22, in markdown
    return renderer.render(Document(iterable))
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 150, in __init__
    self.children = tokenize(lines)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 49, in tokenize
    return tokenizer.tokenize(lines, _token_types)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 51, in tokenize
    return make_tokens(tokenize_block(iterable, token_types))
  File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 67, in tokenize_block
    result = token_type.read(lines)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 734, in read
    match_info = cls.match_reference(lines, string, offset)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 754, in match_reference
    match_info = cls.match_link_dest(string, label_end)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 793, in match_link_dest
    offset = shift_whitespace(string, offset+1)
  File "d:\projects\my-forks\mistletoe\mistletoe\core_tokens.py", line 381, in shift_whitespace
    for i, c in enumerate(string[index:], start=index):
KeyboardInterrupt

@pbodnar pbodnar changed the title Footnotes causes mistletoe to hang Footnote in combination with CRLF line-endings causes mistletoe to hang Nov 10, 2021
@pbodnar pbodnar added enhancement has-workaround A bug that has a workaround. labels Nov 10, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Nov 10, 2021

So I would classify this as an enhancement with a workaround: use simple \n if you need to create an input string with line-endings programmatically (or possibly use a multi-line string).

OK for now?

@ddevault
Copy link
Author

I would not classify a problem in which any input causes the library to hang forever as in need of an enhancement, but rather suffering from a bug. Consider that this is a DoS vector.

I will apply an appropriate workaround (converting CRLF to LF) in my software, but this is definitely a bug and probably an urgent one at that.

@pbodnar pbodnar added bug and removed enhancement labels Nov 11, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Nov 12, 2021

Good news, it looks like I found the culprit in the Footnote.backtrack() method / call. I guess I can come with a fix soon.

pbodnar added a commit that referenced this issue Nov 13, 2021
* also cover more scenarios by unit tests
* also add some inline documentation
@pbodnar pbodnar changed the title Footnote in combination with CRLF line-endings causes mistletoe to hang Mistletoe hangs when parsing some specifically formatted Footnotes Nov 13, 2021
@pbodnar pbodnar self-assigned this Nov 13, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Nov 13, 2021

Fixed in the master branch. It has shown that any whitespace character before \n can break the parsing, not just \r.

@pbodnar pbodnar closed this as completed Nov 13, 2021
@ddevault
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug has-workaround A bug that has a workaround.
Projects
None yet
Development

No branches or pull requests

2 participants