Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdown -> notebook bug: md code blocks with consecutive newlines #188

Closed
rsokl opened this issue Feb 24, 2019 · 13 comments
Closed

markdown -> notebook bug: md code blocks with consecutive newlines #188

rsokl opened this issue Feb 24, 2019 · 13 comments
Milestone

Comments

@rsokl
Copy link

rsokl commented Feb 24, 2019

Any markdown code-block containing multiple consecutive newlines is parsed incorrectly during conversion to a notebook. The additional newlines will become distinct notebook cells. For example:

---
jupyter:
  jupytext:
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.0'
      jupytext_version: 1.0.1
  kernelspec:
    display_name: Python 3
    language: python
    name: python3
---

# ```python
x = 1

y = 2
# ```

(note that the lines of code are separated by two newline characters)

becomes:
image

P.S. jupytext rocks! Thanks for the awesome work!

@mwouts
Copy link
Owner

mwouts commented Feb 24, 2019

Hello @rsokl , thanks for reporting this, and for your nice feedback!

Well, what happens here is that Jupytext separates consecutive markdown cells with two blank lines. Hence it also cuts markdown cells that happen to have two consecutive blank lines into multiple cells.

I am open to suggestions if you see a better way to mark cell breaks in the markdown format. Maybe, when we have metadata support for Markdown (not the case for now #66) we can have explicit cell start/end delimiters.

Also, please note that this issue will not occur if you save your notebook as scripts (in either light or percent formats).

@rsokl
Copy link
Author

rsokl commented Feb 24, 2019

Ah that makes sense. I can try to take a swing at this - could you point me to the function(s) that I would want to refactor?

@mwouts
Copy link
Owner

mwouts commented Feb 24, 2019

Well, maybe we should first discuss how to identify cells in a markdown document... But sure, I can tell you where to look in the code. The cell break on two blank lines happens here. Reversely, the two blank lines between consecutive markdown cells are inserted here.

@rsokl
Copy link
Author

rsokl commented Feb 24, 2019

One though that I had is: don't look for the two blank lines if you are within a markdown code block.

You currently track when a markdown code block begins with, you could potentially also look for the closing ``` marks. That is, once you see something like ```python, a new cell won't be created until after you see the closing ```.

Clearly this is not guaranteed to be true, but I would suspect that far more people include two or more newlines in a markdown codeblock then they do open a code-block in a cell without a closing it.

@mwouts
Copy link
Owner

mwouts commented Feb 24, 2019

Sure, it should be possible to do something around these lines. Actually, rather than commenting the triple quotes with # (which is not great if you then process the markdown document with another program!), we could insert a markdown comment before/after the code block to indicate that it belongs to the markdown cell, and then use that information to preserve the code block itself...

@rsokl
Copy link
Author

rsokl commented Feb 25, 2019

Yeah, that would be a very nice change indeed! As it is, various markdown preview clients are quite confused by that #``` sequence. It may not be until next weekend that I could potentially take a swing at that, but I do think it would be quite nice.

That being said, how would you deal with a substantial change like that from a versioning point of view? That change would prevent current jupytext-markdown files from being converted back to notebooks. Have you had changes like that in the past?

@mwouts mwouts added this to the 1.1.0 milestone Feb 25, 2019
@mwouts
Copy link
Owner

mwouts commented Feb 25, 2019

Yeah, that would be a very nice change indeed! As it is, various markdown preview clients are quite confused by that #``` sequence. It may not be until next weekend that I could potentially take a swing at that, but I do think it would be quite nice.

No problem. Actually, a very useful input for this would be recommendations on how to write markdown comments in a way that is compatible with pandoc and markdown viewers. We've started considering that at #66.

That being said, how would you deal with a substantial change like that from a versioning point of view? That change would prevent current jupytext-markdown files from being converted back to notebooks. Have you had changes like that in the past?

Yes we had. That's the reason why we have a format_version field in the YAML header. Based on this information, Jupytext will refuse to open a notebook paired to a text file in an outdated version, and ask the user to delete one of the two files (in most cases he should choose to keep the .ipynb file).

@rsokl
Copy link
Author

rsokl commented Feb 25, 2019

Yes we had. That's the reason why we have a format_version field in the YAML header. Based on this information, Jupytext will refuse to open a notebook paired to a text file in an outdated version, and ask the user to delete one of the two files (in most cases he should choose to keep the .ipynb file).

Ah! Of course. Great foresight there 😃

Actually, a very useful input for this would be recommendations on how to write markdown comments in a way that is compatible with pandoc and markdown viewers. We've started considering that at #66.

Gotchya. I'll mull this over and will let you know what I come up with!

@rsokl
Copy link
Author

rsokl commented Mar 3, 2019

I noticed that, in converting ipynb to py, that jupytext is able to preserve the type of the raw-cell (e.g. it will note if it is a reST raw cell). However, this information is not preserved in converting to markdown.

It seems like we might be able to lump this in with the markdown-delimiter effort, and thus have the markdown format preserve this during round-trip conversions as well. Thoughts?

@rsokl
Copy link
Author

rsokl commented Mar 4, 2019

Comments in Markdown

Markdown has a syntax in its core specification for making comments/invisible text. It looks like the most generic syntax for this is:

some text 

[I am invisible text]: # 
some more text

Note that the blank line preceding the []: # is important.

There is an incredibly informative thread on stackoverflow about this. Someone tested the various syntaxes for including comments in markdown through Babelmark2, checking them against 28 markdown implementations. The analysis concluded that this syntax is the most general, and is supported by 23 of those implementations.

This seems like a great path forward, imo 😄

Jupytext

Given this discussion, it seems that this is the proper direction for jupytext to delimit various cell-types to preserve round-trip conversions between ipynb and md.

(edit: it just occurred to me that you probably don't even need to delimit the end of a cell. It just goes until another begins)

E.g. A markdown cell could be delimited by:

(blank line)
[jupytext start cell - markdown]: # 
The contents of the cell go here.

and similarly various types of code cells and raw-cells could follow suite:

(blank line)
[jupytext start cell - python]: # 
The contents of the cell go here.
(blank line)
[jupytext start cell - raw:reST]: # 
The contents of the cell go here.

What is really cool is that this will permit jupytext's markdown format to represent both python code cells and markdown cells with python code blocks in such a way that markdown viewers will render them both with syntax highlighting!

@mwouts
Copy link
Owner

mwouts commented Mar 5, 2019

Hello @rsokl , this is very interesting! Thanks for the links. Until now we had mostly considered HTML comments for storing metadata, but sure we can debate this, and yes, I will have a look at this SO thread!

Also, I do agree that the updated Markdown format should preserve the raw cells.

I am not available this week, but later in the month I will certainly give a try to improving the Markdown format. I suggest that we iterate over one or more tentative implementations, which you could test and provide feedback if you'd like? Thanks!

@rsokl
Copy link
Author

rsokl commented Mar 5, 2019

Great! This will be a fantastic update for my use-case of jupytext (which is to encode the entire source material of my site Python Like You Mean It in markdown).

The analysis provided in the SO thread is basically everything we could ask for. I.e. what comment style is most compatible across markdown implementations. I was floored that someone had already done it.

I will add this information over in issue #66

I am happy to test and provide feedback on this content.

And we may want to start small, and just take on:

  • having jupytext's markdown form render nicely in markdown viewers for all types of cells (e.g. python-in-markdown cells and python cells will both get syntax highlighting)
  • support multiple blank lines in a given cell
  • preserve cell-type information (e.g. markdown, python, raw-ReST, raw-HTML, etc)

Taking on generic metadata might be a bit more ambitious for a single patch, since that needs to support basically all of JSON in a markdown comment...

@mwouts
Copy link
Owner

mwouts commented Apr 14, 2019

This should be OK now with version 1.1. Please let me know otherwise.

@mwouts mwouts closed this as completed Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants