Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

anderskaplan · 2022-10-02T11:48:43Z

A set of small fixes which improve the CommonMark spec compliance. See the commit messages for details.

pbodnar · 2022-10-30T19:41:21Z

Great, thank you for another contribution. :)

So it looks like, because spec examples 312 and 313 are already fixed in master now, there will be just 1 failing example left after the fixes provided in here, namely:

$ py -m test.specification
example:  146
markdown: '~~~ aa ``` ~~~\nfoo\n~~~\n'
html:     '<pre><code class="language-aa">foo\n</code></pre>\n'
output:   '<p><del>~ aa ``` </del>~\nfoo</p>\n<pre><code></code></pre>\n'

... the last challenge. ;)

anderskaplan · 2022-10-31T21:31:25Z

Yes, at first I thought it was caused by the strikethrough token, but that can't be. Because the code fences should already be parsed on block level before the strikethrough can kick in on span level.

pbodnar

@anderskaplan, good job, I think all the fixes are correct. 👍 I believe there is room for improving the unit tests though (I'm a big fan of having also the test code simple and consistent, to some extent, of course. ;)) - see my suggestions. Thank you.

test/test_span_token.py

mistletoe/core_tokens.py

test/test_span_token.py

test/test_block_token.py

test/test_span_token.py

mistletoe/core_tokens.py

pbodnar · 2022-11-17T21:49:34Z

Yes, at first I thought it was caused by the strikethrough token, but that can't be. Because the code fences should already be parsed on block level before the strikethrough can kick in on span level.

Yeah, it looks like mistletoe cannot handle the following correctly (stated above the failing Example 146):

Info strings for tilde code blocks can contain backticks and tildes...

So I believe this condition is the culprit of mistletoe's failure:

        if leader[0] in lang or leader[0] in line[match_obj.end():]:
            return False

I.e. https://github.com/miyuchina/mistletoe/blob/v0.9.0/mistletoe/block_token.py#L430.

anderskaplan · 2022-11-22T19:13:58Z

hey @pbodnar thanks for the review! I'll try and find some time to update the tests during next week.

anderskaplan · 2022-12-03T18:22:42Z

all done! the last failing test case will be in a PR of its own.

pbodnar

@anderskaplan, thanks for incorporating all the suggestions, I think it really is all ready for merge now. :)

Yet, would you mind doing a final interactive rebase and squashing?

I think it is fine to keep the "fix example X" commits separated. Also b160a88 could probably stand on its own.
I would just remove # before the example numbers because of the GitHub automatic issue links and add #108 to the end of the summary line.
The commits on fine-tuning unit tests could be either squashed into their corresponding "feature commits" (too much work though?), or squashed into one?

mistletoe/core_tokens.py

test/test_span_token.py

test/test_block_token.py

test/test_span_token.py

mistletoe/core_tokens.py

test/test_span_token.py

anderskaplan · 2022-12-17T15:56:12Z

Updated the commit messages and squashed the unit test updates according to review comments. Should be good to go now!

pbodnar · 2022-12-17T19:17:25Z

2. I would just ... and add ` (#108)` to the end of the summary line.

(note: after looking in the commits history, I have added parentheses around the number)

@anderskaplan, maybe you have overlooked this part (which groups the commits together)? Apart from that, the commits seem perfect now. :)

…nation may contain spaces if it is enclosed in pointy brackets. (miyuchina#108) Solved by allowing spaces in core_tokens.match_link_dest(). Also fixed a problem in core_tokens.is_link_label() where the return value would be malformed if the root element isn't set. This does not happen under normal circumstances, but it caused unit tests to fail.

…tion may contain spaces if it is enclosed in pointy brackets. (miyuchina#108) Solved by allowing spaces in Footnote.match_link_dest().

…sis delimiter lengths being multiples of 3. (miyuchina#108) Solved by adding a missing condition in Delimiter.closed_by(). Also refactored the function to make it more readable.

…blocks may contain blank lines. (miyuchina#108) Solved by adding textarea to the set of HTML tags which allow newlines.

…pace in HTML tags. (miyuchina#108) The problem was that the regex for unquoted attribute values did not break at newlines. Therefore, in this example, the attribute value "baz\nbim!bop" was accepted when it wasn't supposed to. Solved by modifying the regex to break at any whitespace character, not only space.

@pbodnar

…tities. (miyuchina#108) The problem was that the length of html entities like &#87654321; was not checked, leading to the generation of invalid output. The solution was to modify the regex used to match charrefs to include limits on the lengths (as proposed by @pbodnar). The regex workaround was also moved from the HTMLRenderer class to span_token.tokenize() to make it available to all renderers.

…, and inline link parsing. Added positive test for valid html entities.

pbodnar · 2022-12-18T15:56:12Z

Thank you!

pbodnar mentioned this pull request Oct 9, 2022

Update to CommonMark v0.30 #108

Closed

pbodnar self-requested a review October 30, 2022 19:42

pbodnar requested changes Nov 17, 2022

View reviewed changes

anderskaplan force-pushed the fix-108-part-2 branch from fd4a4d8 to b160a88 Compare December 3, 2022 18:21

pbodnar reviewed Dec 17, 2022

View reviewed changes

anderskaplan force-pushed the fix-108-part-2 branch from b160a88 to 488ad8b Compare December 17, 2022 15:54

anderskaplan added 8 commits December 18, 2022 10:13

Fixed failing example 195 in the CommonMark 0.30 spec: A link destina…

5b43db6

…tion may contain spaces if it is enclosed in pointy brackets. (miyuchina#108) Solved by allowing spaces in Footnote.match_link_dest().

Fixed failing examples 415 and 416 in the CommonMark 0.30 spec: empha…

8c90f9d

…sis delimiter lengths being multiples of 3. (miyuchina#108) Solved by adding a missing condition in Delimiter.closed_by(). Also refactored the function to make it more readable.

Fixed failing example 171 in the CommonMark 0.30 spec: textarea HTML …

14a089e

…blocks may contain blank lines. (miyuchina#108) Solved by adding textarea to the set of HTML tags which allow newlines.

Updated and refacored unit tests for invalid html entities, footnotes…

0ffdccd

…, and inline link parsing. Added positive test for valid html entities.

Renamed is_link_label to get_link_label.

7866aca

anderskaplan force-pushed the fix-108-part-2 branch from 488ad8b to 7866aca Compare December 18, 2022 09:15

pbodnar approved these changes Dec 18, 2022

View reviewed changes

pbodnar merged commit 5ef895e into miyuchina:master Dec 18, 2022

pbodnar mentioned this pull request Jan 7, 2023

Finalize version 1.0.0 #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

anderskaplan commented Oct 2, 2022

pbodnar commented Oct 30, 2022

anderskaplan commented Oct 31, 2022

pbodnar left a comment

pbodnar commented Nov 17, 2022 •

edited

Loading

anderskaplan commented Nov 22, 2022

anderskaplan commented Dec 3, 2022

pbodnar left a comment

anderskaplan commented Dec 17, 2022

pbodnar commented Dec 17, 2022

pbodnar commented Dec 18, 2022

Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

Fix for #108, CommonMark 0.30 spec compliance, part 2 #165

Conversation

anderskaplan commented Oct 2, 2022

pbodnar commented Oct 30, 2022

anderskaplan commented Oct 31, 2022

pbodnar left a comment

Choose a reason for hiding this comment

pbodnar commented Nov 17, 2022 • edited Loading

anderskaplan commented Nov 22, 2022

anderskaplan commented Dec 3, 2022

pbodnar left a comment

Choose a reason for hiding this comment

anderskaplan commented Dec 17, 2022

pbodnar commented Dec 17, 2022

pbodnar commented Dec 18, 2022

pbodnar commented Nov 17, 2022 •

edited

Loading