Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

Merged
merged 8 commits into from
Dec 18, 2022

Conversation

anderskaplan
Copy link
Contributor

A set of small fixes which improve the CommonMark spec compliance. See the commit messages for details.

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 30, 2022

Great, thank you for another contribution. :)

So it looks like, because spec examples 312 and 313 are already fixed in master now, there will be just 1 failing example left after the fixes provided in here, namely:

$ py -m test.specification
example:  146
markdown: '~~~ aa ``` ~~~\nfoo\n~~~\n'
html:     '<pre><code class="language-aa">foo\n</code></pre>\n'
output:   '<p><del>~ aa ``` </del>~\nfoo</p>\n<pre><code></code></pre>\n'

... the last challenge. ;)

@pbodnar pbodnar self-requested a review October 30, 2022 19:42
@anderskaplan
Copy link
Contributor Author

Yes, at first I thought it was caused by the strikethrough token, but that can't be. Because the code fences should already be parsed on block level before the strikethrough can kick in on span level.

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, good job, I think all the fixes are correct. 👍 I believe there is room for improving the unit tests though (I'm a big fan of having also the test code simple and consistent, to some extent, of course. ;)) - see my suggestions. Thank you.

test/test_span_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
mistletoe/core_tokens.py Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
test/test_block_token.py Outdated Show resolved Hide resolved
test/test_block_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
mistletoe/core_tokens.py Show resolved Hide resolved
@pbodnar
Copy link
Collaborator

pbodnar commented Nov 17, 2022

Yes, at first I thought it was caused by the strikethrough token, but that can't be. Because the code fences should already be parsed on block level before the strikethrough can kick in on span level.

Yeah, it looks like mistletoe cannot handle the following correctly (stated above the failing Example 146):

Info strings for tilde code blocks can contain backticks and tildes...

So I believe this condition is the culprit of mistletoe's failure:

        if leader[0] in lang or leader[0] in line[match_obj.end():]:
            return False

I.e. https://github.com/miyuchina/mistletoe/blob/v0.9.0/mistletoe/block_token.py#L430.

@anderskaplan
Copy link
Contributor Author

hey @pbodnar thanks for the review! I'll try and find some time to update the tests during next week.

@anderskaplan
Copy link
Contributor Author

all done! the last failing test case will be in a PR of its own.

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, thanks for incorporating all the suggestions, I think it really is all ready for merge now. :)

Yet, would you mind doing a final interactive rebase and squashing?

  1. I think it is fine to keep the "fix example X" commits separated. Also b160a88 could probably stand on its own.
  2. I would just remove # before the example numbers because of the GitHub automatic issue links and add #108 to the end of the summary line.
  3. The commits on fine-tuning unit tests could be either squashed into their corresponding "feature commits" (too much work though?), or squashed into one?

mistletoe/core_tokens.py Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
test/test_block_token.py Outdated Show resolved Hide resolved
test/test_block_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
mistletoe/core_tokens.py Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
@anderskaplan
Copy link
Contributor Author

Updated the commit messages and squashed the unit test updates according to review comments. Should be good to go now!

@pbodnar
Copy link
Collaborator

pbodnar commented Dec 17, 2022

2. I would just ... and add ` (#108)` to the end of the summary line.

(note: after looking in the commits history, I have added parentheses around the number)

@anderskaplan, maybe you have overlooked this part (which groups the commits together)? Apart from that, the commits seem perfect now. :)

…nation may contain spaces if it is enclosed in pointy brackets. (miyuchina#108)

Solved by allowing spaces in core_tokens.match_link_dest().
Also fixed a problem in core_tokens.is_link_label() where the return value would be malformed if the root element isn't set. This does not happen under normal circumstances, but it caused unit tests to fail.
…tion may contain spaces if it is enclosed in pointy brackets. (miyuchina#108)

Solved by allowing spaces in Footnote.match_link_dest().
…sis delimiter lengths being multiples of 3. (miyuchina#108)

Solved by adding a missing condition in Delimiter.closed_by().
Also refactored the function to make it more readable.
…blocks may contain blank lines. (miyuchina#108)

Solved by adding textarea to the set of HTML tags which allow newlines.
…pace in HTML tags. (miyuchina#108)

The problem was that the regex for unquoted attribute values did not break at newlines.
Therefore, in this example, the attribute value "baz\nbim!bop" was accepted when it wasn't supposed to.
Solved by modifying the regex to break at any whitespace character, not only space.
…tities. (miyuchina#108)

The problem was that the length of html entities like &#87654321; was not checked,
leading to the generation of invalid output.
The solution was to modify the regex used to match charrefs to include limits on the lengths (as proposed by @pbodnar).
The regex workaround was also moved from the HTMLRenderer class to span_token.tokenize() to make it available to all renderers.
…, and inline link parsing. Added positive test for valid html entities.
@pbodnar pbodnar merged commit 5ef895e into miyuchina:master Dec 18, 2022
@pbodnar
Copy link
Collaborator

pbodnar commented Dec 18, 2022

Thank you!

@pbodnar pbodnar mentioned this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants