Skip to content

Commit

Permalink
Merge branch '1.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
mwouts committed Apr 2, 2019
2 parents 392ffaa + f631236 commit 1a4a867
Show file tree
Hide file tree
Showing 77 changed files with 2,564 additions and 393 deletions.
16 changes: 16 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,29 @@
Release History
---------------

1.1.0 (2019-04-??)
++++++++++++++++++++++

**Improvements**

- Markdown and R Markdown formats now support metadata (#66, #111, #188)
- The ``light`` format for Scripts can use custom cell markers, e.g. Vim or VScode/PyCharm folding markers (#199)

**BugFixes**

- Jupytext's contents manager is now based on ``LargeFileManager`` to allow large file uploads (#210)
- YAML header parsed with yaml.safe_load rather than yaml.load
- IPython line magic can be split across lines (#209)


1.0.5 (2019-03-26)
++++++++++++++++++++++

**BugFixes**

- Fix the error 'notebook file has changed on disk' when saving large notebooks (#207)


1.0.4 (2019-03-20)
++++++++++++++++++++++

Expand Down
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,9 +304,8 @@ Note that `jupytext --test` compares the resulting notebooks according to its ex
Please note that
- Scripts opened with Jupyter have a default [metadata filter](#default-metadata-filtering) that prevents additional notebook or cell
metadata to be added back to the script. Remove the filter if you want to store Jupytext's settings, or the kernel information, in the text file.
- Cell metadata are available in `light` and `percent` formats for all cell types. Sphinx Gallery scripts in `sphinx` format do not support cell metadata. R Markdown and R scripts in `spin` format support cell metadata for code cells only. Markdown documents do not support cell metadata.
- Cell metadata are available in the `light` and `percent` formats, as well as in the Markdown and R Markdown formats. R scripts in `spin` format support cell metadata for code cells only. Sphinx Gallery scripts in `sphinx` format do not support cell metadata.
- By default, a few cell metadata are not included in the text representation of the notebook. And only the most standard notebook metadata are exported. Learn more on this in the sections for [notebook specific](#-per-notebook-configuration) and [global settings](#default-metadata-filtering) for metadata filtering.
- Representing a Jupyter notebook as a Markdown or R Markdown document has the effect of splitting markdown cells with two consecutive blank lines into multiple cells (as the two blank line pattern is used to separate cells).

### Reading notebooks in Python

Expand Down Expand Up @@ -334,35 +333,45 @@ Save Jupyter notebooks as [Markdown](https://daringfireball.net/projects/markdow

[R Markdown](https://rmarkdown.rstudio.com/authoring_quick_tour.html) is [RStudio](https://www.rstudio.com/)'s format for notebooks, with support for R, Python, and many [other languages](https://bookdown.org/yihui/rmarkdown/language-engines.html).

Our implementation for Jupyter notebooks as [Markdown](https://daringfireball.net/projects/markdown/syntax) or [R Markdown](https://rmarkdown.rstudio.com/authoring_quick_tour.html) documents is straightforward:
- A YAML header contains the notebook metadata (Jupyter kernel, etc)
- Markdown cells are inserted verbatim, and separated with two blank lines
- Code and raw cells start with triple backticks collated with cell language, and end with triple backticks. Cell metadata are not available in the Markdown format. The [code cell options](https://yihui.name/knitr/options/) in the R Markdown format are mapped to the corresponding Jupyter cell metadata options, when available.

Jupytext's implementation for Jupyter notebooks as [Markdown](https://daringfireball.net/projects/markdown/syntax) or [R Markdown](https://rmarkdown.rstudio.com/authoring_quick_tour.html) documents is as follows:
- The notebook metadata (Jupyter kernel, etc) goes to a YAML header
- Code and raw cells are encoded as Markdown code blocks with triple backticks. In a Python notebook, a code cell starts with ` ```python` and ends with ` ``` `. Cell metadata are found after the language information, with a `key=value` syntax, where `value` is encoded in JSON format (Markdown) or R format (R Markdown). R Markdown [code cell options](https://yihui.name/knitr/options/) are mapped to the corresponding Jupyter cell metadata options, when available.
- Markdown cells are inserted verbatim and separated with two blank lines. When required (cells with metadata, cells that contain two blank lines or code blocks), Jupytext protects the cell boundary with HTML comments: `<!-- #region -->` and `<!-- #endregion -->`. Cells with explicit boundaries are [foldable](https://code.visualstudio.com/docs/editor/codebasics#_folding) in vscode, and can accept both a title and/or metadata in JSON format: `<!-- #region This is the title for my protected cell {"key": "value"}-->`.

See how our `World population.ipynb` notebook in the [demo folder](https://github.com/mwouts/jupytext/tree/master/demo) is represented in [Markdown](https://github.com/mwouts/jupytext/blob/master/demo/World%20population.md) or [R Markdown](https://github.com/mwouts/jupytext/blob/master/demo/World%20population.Rmd).

When editing Jupyter Markdown, you can split text into markdown cells by adding two blank lines at the point you want the text to split. This is the default rule, but you may want to modify the rule for the case of Markdown headers in text. By default, a single blank line followed by a Markdown header will not cause the cell to split, so the header will appear in the middle of a text cell. You may prefer to always split text cells at headers. If so, use the `split_at_heading` option. Set the option either on the command line, or by adding `"split_at_heading": true` to the jupytext section in the notebook metadata, or on Jupytext's content manager:
When you open a plain Markdown file in Jupytext, the Markdown text is rendered in Markdown cells. Cells breaks occur when the text contains two consecutive blank lines (or code cells). If you want to also split cells on Markdown headers, so that headers prefixed by one blank line appear at the top of a new cell, use the `split_at_heading` option. Set the option either on the command line, or by adding `"split_at_heading": true` to the jupytext section in the notebook metadata, or on Jupytext's content manager:

```python
c.ContentsManager.split_at_heading = True
```

This will cause jupytext to split markdown text cells at heading prefixed by one blank line, so the heading appears at the top of a new cell. Without this option, you would need two blank lines above the heading to cause the split.

### The `light` format for notebooks as scripts

The `light` format was created for this project. It is the default format for scripts. That format can read any script as a Jupyter notebook, even scripts which were never prepared to become a notebook. When a notebook is written as a script using this format, only a few cells markers are introduced—none if possible.

The `light` format has:
- A (commented) YAML header, that contains the notebook metadata.
- Markdown cells are commented, and separated with a blank line.
- Markdown cells are commented, and separated from other cells with a blank line.
- Code cells are exported verbatim (except for Jupyter magics, which are commented), and separated with blank lines. Code cells are reconstructed from consistent Python paragraphs (no function, class or multiline comment will be broken).
- Cells that contain more than one Python paragraphs need an explicit start-of-cell delimiter `# +` (`// +` in C++, etc). Cells that have explicit metadata have a cell header `# + {JSON}` where the metadata is represented, in JSON format. The end of cell delimiter is `# -`, and is omitted when followed by another explicit start of cell marker.
- Cells that contain more than one Python paragraphs need an explicit start-of-cell delimiter that is, by default, `# +` (`// +` in C++, etc). Cells that have explicit metadata have a cell header `# + {JSON}` where the metadata is represented, in JSON format. The default end of cell delimiter is `# -`, and is omitted when followed by another explicit start of cell marker.

The `light` format is currently available for Python, Julia, R, Bash, Scheme, Clojure, Matlab, Octave, C++ and q/kdb+. Open our sample notebook in the `light` format [here](https://github.com/mwouts/jupytext/blob/master/demo/World%20population.lgt.py).

A variation of the `light` format is the `bare` format, with no cell marker at all. Please note that this format will split your code cells on code paragraphs. By default, this format still includes a YAML header - if you prefer to also remove the header, set `"notebook_metadata_filter": "-all"` in the jupytext section of your notebook metadata.

The `light` format can use custom cell markers instead of `# +` or `# -`. If you prefer to mark cells with VScode/PyCharm (resp. Vim) folding markers, set `"cell_markers": "region,endregion"` (resp. `"{{{,}}}"`) in the jupytext section of the notebook metadata. If you want to configure this as a global default, add either
```python
c.ContentsManager.default_cell_markers = "region,endregion" # Use VScode/PyCharm region folding delimiters
```
or
```python
c.ContentsManager.default_cell_markers = "{{{,}}}" # Use Vim region folding delimiters
```
to your `.jupyter/jupyter_notebook_config.py` file.


### The `percent` format

The `percent` format is a representation of Jupyter notebooks as scripts, in which cells are delimited with a commented double percent sign `# %%`. The format was introduced by Spyder five years ago, and is now supported by many editors, including
Expand Down
95 changes: 65 additions & 30 deletions jupytext/cell_metadata.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,18 @@
"""
Convert between text notebook metadata and jupyter cell metadata.
Standard cell metadata are documented here:
See also https://ipython.org/ipython-doc/3/notebook/nbformat.html#cell-metadata
metadata.hide_input and metadata.hide_output are documented here:
http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/runtools/readme.html
TODO: Update this if a standard gets defined at
https://github.com/jupyter/notebook/issues/3700
Note: Nteract uses "outputHidden" and "inputHidden". We may want to switch
to those.
"""

import ast
import json
import re
from json import loads, dumps

try:
from json import JSONDecodeError
except ImportError:
JSONDecodeError = ValueError

from .languages import _JUPYTER_LANGUAGES

Expand All @@ -24,6 +21,9 @@
except NameError:
unicode = str # Python 3

# Map R Markdown's "echo" and "include" to "hide_input" and "hide_output", that are understood by the `runtools`
# extension for Jupyter notebook, and by nbconvert (use the `hide_input_output.tpl` template).
# See http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/runtools/readme.html
_BOOLEAN_OPTIONS_DICTIONARY = [('hide_input', 'echo', True),
('hide_output', 'include', True)]
_JUPYTEXT_CELL_METADATA = [
Expand All @@ -46,12 +46,10 @@ def _r_logical_values(pybool):

class RLogicalValueError(Exception):
"""Incorrect value for R boolean"""
pass


class RMarkdownOptionParsingError(Exception):
"""Error when parsing Rmd cell options"""
pass


def _py_logical_values(rbool):
Expand Down Expand Up @@ -253,27 +251,64 @@ def rmd_options_to_metadata(options):
return metadata.get('language') or language, metadata


def metadata_to_md_options(metadata):
"""Encode {'class':None, 'key':'value'} into 'class key="value"' """

return ' '.join(["{}={}".format(key, dumps(metadata[key]))
if metadata[key] is not None else key for key in metadata])


def parse_md_code_options(options):
"""Parse 'python class key="value"' into [('python', None), ('class', None), ('key', 'value')]"""

metadata = []
while options:
name_and_value = re.split(r'[\s=]+', options, maxsplit=1)
name = name_and_value[0]

# Equal sign in between name and what's next?
if len(name_and_value) == 2:
sep = options[len(name):-len(name_and_value[1])]
has_value = sep.find('=') >= 0
options = name_and_value[1]
else:
has_value = False
options = ''

if not has_value:
metadata.append((name, None))
continue

try:
value = loads(options)
options = ''
except JSONDecodeError as err:
try:
split = err.colno - 1
except AttributeError:
# str(err) is like: "ValueError: Extra data: line 1 column 7 - line 1 column 50 (char 6 - 49)"
match = re.match(r'.*char ([0-9]*)', str(err))
split = int(match.groups()[0])

value = loads(options[:split])
options = options[split:]

metadata.append((name, value))

return metadata


def md_options_to_metadata(options):
"""Parse markdown options and return language and metadata (cell name)"""
language = None
name = None

options = [opt for opt in options.split(' ') if opt != '']
if len(options) >= 2:
language, name = options[:2]
elif options:
language = options[0]
"""Parse markdown options and return language and metadata"""
metadata = parse_md_code_options(options)

if language:
if metadata:
language = metadata[0][0]
for lang in _JUPYTER_LANGUAGES + ['julia', 'scheme', 'c++']:
if language.lower() == lang.lower():
if name:
return lang, {'name': name}
return lang, {}

return None, {'name': language}
return lang, dict(metadata[1:])

return None, {}
return None, dict(metadata)


def try_eval_metadata(metadata, name):
Expand All @@ -298,7 +333,7 @@ def try_eval_metadata(metadata, name):
def json_options_to_metadata(options, add_brackets=True):
"""Read metadata from its json representation"""
try:
options = json.loads('{' + options + '}' if add_brackets else options)
options = loads('{' + options + '}' if add_brackets else options)
return options
except ValueError:
return {}
Expand All @@ -309,7 +344,7 @@ def metadata_to_json_options(metadata):
for key in _JUPYTEXT_CELL_METADATA:
metadata.pop(key, None)

return json.dumps(metadata)
return dumps(metadata)


def is_active(ext, metadata):
Expand Down
Loading

0 comments on commit 1a4a867

Please sign in to comment.