html.unescape in tokenizer #135

not-my-profile · 2022-02-13T07:51:24Z

This PR fixes that HTML entities end up in the LaTeX output by the LaTeXRenderer. For details see the individual commits.

If you like these changes, please merge them without squashing.

README.md

mistletoe/html_renderer.py

Python 3.6 has already reached end-of-life, so supporting 3.3 doesn't make any sense.

mistletoe/span_token.py

README.md

mistletoe/html_renderer.py

pbodnar

Looks great, just some smaller details to tune.

Previously & ended up in the LaTeX when using the LaTeXRenderer, which is obviously wrong.

Previously the HTML unescaping took place in the HTMLRenderer. The LaTeXRenderer didn't do it so e.g. HTML entities in links ended up in the LaTeX output. The correct way is to resolve the HTML entities during the tokenization.

It does not make sense to put test-specific code in the code that runs in production. Escaping ' as ' is perfectly fine and CommonMark compliant. (The official CommonMark test suite actually performs HTML normalization by default).

Now that the method just calls html.escape, using it doesn't make sense anymore.

pbodnar · 2022-02-14T19:39:15Z

Thank you. :)

BTW You may have noticed that I'm a fan of conventional commits, if you would like to give it a try. But I won't force it on you. ;)

not-my-profile mentioned this pull request Feb 13, 2022

Facilitate extending HTMLRenderer to add attributes #134

Closed

not-my-profile force-pushed the html.unescape-in-tokenizer branch from a916bac to 8af2597 Compare February 13, 2022 07:54

pbodnar reviewed Feb 13, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

pbodnar reviewed Feb 13, 2022

View reviewed changes

mistletoe/html_renderer.py Show resolved Hide resolved

Bump minimum supported Python version to 3.5

c7d3e16

Python 3.6 has already reached end-of-life, so supporting 3.3 doesn't make any sense.

not-my-profile force-pushed the html.unescape-in-tokenizer branch from 8af2597 to fb45fb0 Compare February 13, 2022 14:27

pbodnar reviewed Feb 13, 2022

View reviewed changes

mistletoe/span_token.py Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

pbodnar reviewed Feb 13, 2022

View reviewed changes

mistletoe/html_renderer.py Show resolved Hide resolved

pbodnar requested changes Feb 13, 2022

View reviewed changes

not-my-profile added 5 commits February 14, 2022 10:19

Add missing EscapeSequence.strip for Image title & src

57e1093

Unescape HTML entities in RawText in tokenizer

ff5d417

Previously & ended up in the LaTeX when using the LaTeXRenderer, which is obviously wrong.

Unescape HTML entities in EscapeSequence.strip

22ec204

Previously the HTML unescaping took place in the HTMLRenderer. The LaTeXRenderer didn't do it so e.g. HTML entities in links ended up in the LaTeX output. The correct way is to resolve the HTML entities during the tokenization.

Move replace that's only for tests to test

ef9bd3a

It does not make sense to put test-specific code in the code that runs in production. Escaping ' as ' is perfectly fine and CommonMark compliant. (The official CommonMark test suite actually performs HTML normalization by default).

Deprecate HTMLRenderer.escape_html

6c52a20

Now that the method just calls html.escape, using it doesn't make sense anymore.

not-my-profile force-pushed the html.unescape-in-tokenizer branch from fb45fb0 to 6c52a20 Compare February 14, 2022 09:22

not-my-profile requested a review from pbodnar February 14, 2022 09:23

pbodnar approved these changes Feb 14, 2022

View reviewed changes

pbodnar merged commit ce8ac0a into miyuchina:master Feb 14, 2022

pbodnar mentioned this pull request Jul 3, 2022

HTMLRenderer: Revise usage and implementation of html_escape() #115

Closed

pbodnar mentioned this pull request Sep 4, 2022

Fix for part of #108, Update to CommonMark v0.30 #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html.unescape in tokenizer #135

html.unescape in tokenizer #135

not-my-profile commented Feb 13, 2022

pbodnar left a comment

pbodnar commented Feb 14, 2022

html.unescape in tokenizer #135

html.unescape in tokenizer #135

Conversation

not-my-profile commented Feb 13, 2022

pbodnar left a comment

Choose a reason for hiding this comment

pbodnar commented Feb 14, 2022