Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown renderer #162

Merged
merged 4 commits into from
Jun 10, 2023
Merged

Markdown renderer #162

merged 4 commits into from
Jun 10, 2023

Conversation

anderskaplan
Copy link
Contributor

@anderskaplan anderskaplan commented Sep 17, 2022

This is a pull request for a Markdown-to-Markdown renderer (#4).

A few notes on the implementation:

  1. Since the aim is to preserve as much of the formatting as possible, and also steer clear of issues like if
    the sequence _**Hello**_ would be rendered as ***Hello*** (which changes its interpretation),
    I had to save some formatting information from the parsing in the tokens. The formatting info will not appear in the output from the ASTRenderer or from repr().

  2. The Markdown renderer can also do word wrapping. (Because I needed that for a use case.) But that's optional, and when it's not enabled, the goal is still to make the roundtrip Markdown -> parse -> render -> Markdown with as few changes as possible.

  3. The test cases in TestMarkdownRenderer give a pretty good idea of what the renderer is capable of.
    In particular, note that it handles nested blocks: you can put a code block inside a list inside a block quote
    if you like. (This is explicitly allowed by the CommonMark spec.)

  4. Three new tokens! Well actually, only one which is completely new and that's the BlankLine.
    The other two represent link reference definitions (a.k.a. footnotes) and inherit from theFootnote token.
    These tokens are used to represent content that is otherwise not included in the AST.

    An open question is whether they belong in the same module as the markdown renderer class, or if they should
    be moved somewhere else. Maybe create a new module to host all optional tokens?

  5. The markdown renderer handles span tokens and block tokens differently.
    Block tokens are rendered into chunks of lines.
    Span tokens are first rendered into sequences of "fragments", which are essentially strings with some additional metadata attached, and then into chunks of lines.

  6. Tables are supported.

  7. I put the MarkdownRenderer in the main package, not in contrib, because that's what @miyuchina wrote in a comment on Markdown-to-Markdown renderer #4.

Hope this can be useful and looking forward to some review comments. As it's a fairly large PR, I'd recommend to review it commit by commit. All changes to existing code come first, and then finally the renderer itself.

@pbodnar
Copy link
Collaborator

pbodnar commented Sep 18, 2022

@anderskaplan, thank you for another contribution. :)

I will try to check it out soon, but of course, I cannot promise anything. Hopefully we'll also get feedback from some other contributors this time. (I think it would be ideal to firstly finish all the current smaller changes (when compared to this one) and finally release version 1.0.0 and this one could go for example into 1.1.0, but we'll see...)

7. The markdown renderer has a main method. It can be convenient, since the renderers in contrib aren't easily
run from the regular mistletoe cli. Not sure if it should be kept. (Or whether the markdown renderer should
stay in the contrib directory or be part of core?)

Yeah, I'm not a big fan of the "contrib" folder as well, I would probably vote for getting rid of it ASAP, see #101 for more details & discussion.

@karlicoss
Copy link
Contributor

hey, any updates on this? :)

@anderskaplan
Copy link
Contributor Author

Hi @karlicoss, this branch is dependent on #172. As soon as that PR has been merged, I'll be able to prepare this one for showtime.

@coveralls
Copy link

coveralls commented Mar 18, 2023

Coverage Status

coverage: 90.359% (+1.2%) from 89.174% when pulling 61cec4a on anderskaplan:markdown-renderer into e853a0d on miyuchina:master.

@anderskaplan anderskaplan marked this pull request as ready for review March 18, 2023 09:07
@anderskaplan anderskaplan changed the title DRAFT markdown renderer Markdown renderer Mar 18, 2023
@geolta
Copy link

geolta commented Mar 19, 2023

Hi, I had no deep look into the code but I hope that someone else knows the answer: Is implementing a (live) syntax highlighter based on mistletoe possible after these changes? As far as I have seen, it is not the positions that are saved (which would be best for my specific case), but the syntax elements like **. But that shouldn't be a problem anyway, since the Markdown renderer should be able to be modified so that the positions can be pulled out of there again, right?

And: Really thank you all for this great Markdown parser!

@anderskaplan
Copy link
Contributor Author

Hi @geolta , it depends a lot what you mean by "(live) syntax highlighter". I suppose "live" means that it highlights while the user is typing. But what is it you want to highlight -- is it for example headings, emphasized text, and the like? And is this for a user interface written in python? If the answer is yes in both cases, then I think mistletoe should be a good match.

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, thanks a lot. I have reviewed the first 2 commits for now, giving you some feedback until I find the time for another review session...

test/test_block_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/core_tokens.py Outdated Show resolved Hide resolved
mistletoe/span_token.py Outdated Show resolved Hide resolved
mistletoe/span_token.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't completed the review yet, getting close slowly :P, here are another few remarks...

mistletoe/block_token.py Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round complete. :P :)

mistletoe/block_token.py Show resolved Hide resolved
mistletoe/span_token.py Outdated Show resolved Hide resolved
mistletoe/core_tokens.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
test/test_span_token.py Outdated Show resolved Hide resolved
test/test_block_token.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, thank you. Hopefully I will get to the rest of the review soon (this week(end)).

mistletoe/span_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/span_token.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
dev-guide.md Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
@pbodnar pbodnar added this to the 1.1.0 milestone May 28, 2023
@pbodnar
Copy link
Collaborator

pbodnar commented May 28, 2023

It's been a while since the beginning of this PR. I think that we could finish just the minor cleanups and put this long awaited feature into the very next release version, with a warning that the renderer's API is in a "preview" state and it might change in a future version. What do you think?

@anderskaplan
Copy link
Contributor Author

Sorry for the slow responses lately @pbodnar -- It's been a very busy May.

I fully agree that it would be very nice to get this into the next major release, and I'll try and resolve the remaining comments as soon as possible.

@anderskaplan
Copy link
Contributor Author

This question should also be addressed before merging, imho:

Four new tokens! An open question is whether they belong in the same module as the markdown renderer class, or if they should be moved somewhere else. Maybe create a new module to host all optional tokens?

I think moving all optional tokens into a module could be pretty neat. What do you say @pbodnar ?

@pbodnar
Copy link
Collaborator

pbodnar commented May 29, 2023

This question should also be addressed before merging, imho:

Four new tokens! An open question is whether they belong in the same module as the markdown renderer class, or if they should be moved somewhere else. Maybe create a new module to host all optional tokens?

I think moving all optional tokens into a module could be pretty neat. What do you say @pbodnar ?

Or maybe 3 new tokens currently if I look correctly (Fragment class not included)? I think I like them where they are now, but maybe if you give me a better idea of for example their reuse from other (future) modules, I could change my mind? :)

@anderskaplan
Copy link
Contributor Author

This question should also be addressed before merging, imho:

Four new tokens! An open question is whether they belong in the same module as the markdown renderer class, or if they should be moved somewhere else. Maybe create a new module to host all optional tokens?

I think moving all optional tokens into a module could be pretty neat. What do you say @pbodnar ?

Or maybe 3 new tokens currently if I look correctly (Fragment class not included)? I think I like them where they are now, but maybe if you give me a better idea of for example their reuse from other (future) modules, I could change my mind? :)

Three they are. I think they were four at some point, but the last one turned out not to be necessary.

In any case, if you think it's ok that they stay where they are, then I'll just leave them there.

@anderskaplan
Copy link
Contributor Author

There, I believe all comments are addressed now. 😄

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, thanks a lot. Just 1 or 2 smaller remarks which I think we could look at before merging.

mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
test/test_markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, thanks for another round. :)

Do you think you can prepare a final set of squashed atomic commits, ideally with commit messages as described at https://github.com/miyuchina/mistletoe/blob/master/CONTRIBUTING.md? At minimum, I would like to have all summary lines end with (#162), which I believe is pretty practical think. Sure, I can do this myself, but this is still kinda of your "child", so... :)

dev-guide.md Outdated Show resolved Hide resolved
test/test_markdown_renderer.py Outdated Show resolved Hide resolved
mistletoe/markdown_renderer.py Show resolved Hide resolved
@anderskaplan
Copy link
Contributor Author

Here we go!

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan, good job, thank you once again. :) If you don't mind, I will give you explicit credits in the release notes.

I have also run python test/benchmark.py mistletoe to see performance impact of this PR. It looks like there is about 1 to 2 % performance penalty due to the extended parsing. Nothing to worry about then.

@pbodnar pbodnar merged commit 8937994 into miyuchina:master Jun 10, 2023
pbodnar pushed a commit that referenced this pull request Jun 10, 2023
…e source Markdown in the tokens. Improved docstrings. (#162)
@pbodnar pbodnar linked an issue Jun 10, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Markdown-to-Markdown renderer
5 participants