Skip to content

Commit

Permalink
Merge pull request #14 from mwouts/v0.2.6
Browse files Browse the repository at this point in the history
Version 0.2.6
  • Loading branch information
mwouts authored Jul 13, 2018
2 parents e059120 + 7c99cbb commit de57bfc
Show file tree
Hide file tree
Showing 9 changed files with 229 additions and 176 deletions.
13 changes: 13 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@ Release History
dev
+++

0.2.6 (2018-07-13)
+++++++++++++++++++

**Improvements**

- Introduced `nbrmd_sourceonly_format` metadata
- Inputs are loaded from `.Rmd` file when a matching `.ipynb` file is
opened.

**BugFixes**

- Trusted notebooks remain trusted (#12)

0.2.5 (2018-07-11)
+++++++++++++++++++

Expand Down
92 changes: 53 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,37 @@ You will be interested in this if

## What is R markdown?

R markdown (extension `.Rmd`) is a well established markdown [notebook format](https://rmarkdown.rstudio.com/). As the name states, R markdown was designed in the R community, but it actually support [many languages](https://yihui.name/knitr/demo/engines/). A few months back, the support for python significantly improved with the arrival of the [`reticulate`](https://github.com/rstudio/reticulate) package.

R markdown is almost identical to markdown export of Jupyter notebooks. For reference, Jupyter notebooks are exported to markdown using either
- _Download as Markdown (.md)_ in Jupyter's interface,
- or `nbconvert notebook.ipynb --to markdown`.

Major difference is that code chunks can be evaluated. While markdown's standard syntax start a python code paragraph with

```python

R markdown starts an active code chunks with
R markdown (extension `.Rmd`) is a *source only* format for notebooks.
As the name states, R markdown was designed in the R community, and is
the reference [notebook format](https://rmarkdown.rstudio.com/) there.
The format actually supports [many languages](https://yihui
.name/knitr/demo/engines/).

R markdown is almost like plain markdown. There are only two differences:
- R markdown has a specific syntax for active code cells, that start with
```
```{python}
```
These active cells may optionally contain cell options.
- a YAML header, that describes the notebook title, author, and desired
output (HTML, slides, PDF...).

```{python}
Look at [nbrmd/tests/ioslides.Rmd](https://github.com/mwouts/nbrmd/blob/master/tests/ioslides.Rmd) for a sample R markdown file (that, actually, only includes python cells).

A smaller difference is the common presence of a YAML header, that describes the notebook title, author, and desired output (HTML, slides, PDF...).
## Why R markdown and not filtered `.ipynb` under version control?

Look at [nbrmd/tests/ioslides.Rmd](https://github.com/mwouts/nbrmd/blob/master/tests/ioslides.Rmd) for a sample R markdown file (that, actually, only includes python cells).
The common practice for having Jupyter notebooks under version control is
to remove outputs with a pre-commit hook. That works well and this will
indeed get you a clean commit history.

However, you may run into trouble when you try to *merge* two `.ipynb`
notebooks in a simple text editor. Merging text notebooks, like the `.Rmd`
ones that this package provides, is much simpler.

## How do I open R markdown notebooks in Jupyter?

The `nbrmd` package offers a `ContentsManager` for Jupyter that recognizes
`.md` and `.Rmd` files as notebooks. To use it,
The `nbrmd` package offers a `ContentsManager` for Jupyter that recognizes `
.Rmd` files as notebooks. To use it,
- generate a jupyter config, if you don't have one yet, with `jupyter notebook --generate-config`
- edit the config and include this:
```python
Expand All @@ -51,28 +59,26 @@ pip install nbrmd
jupyter notebook
```

Now you can open your `.md` and `.Rmd` files as notebooks in Jupyter,
and save your jupyter notebooks in R markdown format.
Now you can open your `.Rmd` files as notebooks in Jupyter,
and save your jupyter notebooks in R markdown format (see below).

Rmd notebook in jupyter | Rmd notebook as text
:--------------------------:|:-----------------------:
![](https://raw.githubusercontent.com/mwouts/nbrmd/master/img/rmd_notebook.png) | ![](https://raw.githubusercontent.com/mwouts/nbrmd/master/img/rmd_in_text_editor.png)

When a file with an identical name and a `.ipynb` extension is found,
`nbrmd` loads the outputs from that file. This way, you can put the `.Rmd`
file under version control, and preserve the outputs that match unchanged
inputs.

## Can I save my Jupyter notebook as both R markdown and ipynb?

Yes. That's useful if you want to preserve the outputs locally, or if you want
to share the `.ipynb` version. We offer both per-notebook, and global configuration.
Yes. That's even the recommended setting for the notebooks you want to
set under *version control*.

You need to choose whever to configure this per notebook, or globally.

### Per-notebook configuration

The R markdown content manager includes a pre-save hook that will keep up-to date versions of your notebook
under the file extensions specified in the `nbrmd_formats` metadata. Edit the notebook metadata in Jupyter and
append a list for the desired format, like this:
append a list for the desired formats, like this:
```
{
"kernelspec": {
Expand All @@ -82,33 +88,41 @@ append a list for the desired format, like this:
"language_info": {
(...)
},
"nbrmd_formats": [".ipynb", ".Rmd"]
"nbrmd_formats": [".ipynb", ".Rmd"],
"nbrmd_sourceonly_format": ".Rmd"
}
```

Accepted formats are: `.ipynb`, `.Rmd` and `.md`.

### Global configuration

If you want every notebook to be saved as both `.Rmd` and `.ipynb` files, then change your jupyter config to
```python
c.NotebookApp.contents_manager_class = 'nbrmd.RmdFileContentsManager'
c.ContentsManager.pre_save_hook = 'nbrmd.update_rmd_and_ipynb'
c.ContentsManager.default_nbrmd_formats = ['.ipynb', '.Rmd']
```

If you prefer to update just one of `.Rmd` or `.ipynb` files, then change the above to
`nbrmd.update_rmd` or `nbrmd.update_ipynb` as the `pre_save_hook` (and yes, you're free to use the `pre_save_hook`
with the default `ContentsManager`).

:warning: Be careful not to open twice a notebook with two distinct extensions! You should _shutdown_ the notebooks
with the extension you are not currently editing (list your open notebooks with the _running_ tab in Jupyter).
If you prefer to update just `.Rmd`, change the above accordingly (you will
still be able to open regular `.ipynb` notebooks).

## Recommendations for version control

I recommend that you only add the R markdown file to version control. When you integrate a change
on that file that was not done through your Jupyter editor, you should be careful to re-open the
`.Rmd` file, not the `.ipynb` one. As mentionned above, outputs that corresponds to
unchanged inputs will be loaded from the `.ipynb` file.
I recommend that you set `nbrmd_formats` to `[".ipynb", ".Rmd"]`, either
in the default configuration, or in the notebook metadata (see above).

When you save your notebook, two files are generated,
with `.Rmd` and `.ipynb` extensions. Then, when you reopen
either one or the other,
- cell input are taken from the _source only_ format, here `.Rmd` file
- cell outputs are taken from `.ipynb` file.

This way, you can set the `.Rmd` file under version control, and still have
the commodity of having cell output stored in the ` .ipynb` file. When
the `.Rmd` file is updated outside of Jupyter, then you simply reload the
notebook, and benefit of the updates.

:warning: Be careful not to open twice a notebook with two distinct
extensions! You should _shutdown_ the notebooks with the extension you are not
currently editing (list your open notebooks with the _running_ tab in Jupyter).

## How do I use the converter?

Expand Down
6 changes: 1 addition & 5 deletions nbrmd/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,13 @@
Use this module to read or write Jupyter notebooks as R Markdown documents
(methods 'read', 'reads', 'write', 'writes')
Use the jupyter pre-save hooks (see the documentation) to automatically
dump your Jupyter notebooks as a Rmd file, in addition to the ipynb file
(or the opposite)
Use the RmdFileContentsManager to open Rmd and Jupyter notebooks in Jupyter
Use the 'nbrmd' conversion script to convert Jupyter notebooks from/to
R Markdown notebooks.
"""

from .nbrmd import read, reads, readf, write, writes, writef
from .hooks import update_rmd, update_ipynb, \
update_rmd_and_ipynb, update_selected_formats

try:
from .rmarkdownexporter import RMarkdownExporter
Expand Down
142 changes: 117 additions & 25 deletions nbrmd/cm.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,61 @@
import notebook.transutils
from notebook.services.contents.filemanager import FileContentsManager
from tornado.web import HTTPError
from nbrmd.combine import combine_inputs_with_outputs
from .hooks import update_selected_formats

import os
import nbrmd
import nbformat
import mock

from . import combine


def update_alternative_formats(model, path, contents_manager=None, **kwargs):
"""
A pre-save hook for jupyter that saves the notebooks
under the alternative form. Target extensions are taken from
notebook metadata 'nbrmd_formats', or when not available,
from contents_manager.default_nbrmd_formats
:param model: data model, that may contain the notebook
:param path: full name for ipython notebook
:param contents_manager: ContentsManager instance
:param kwargs: not used
:return:
"""

# only run on notebooks
if model['type'] != 'notebook':
return

# only run on nbformat v4
nb = model['content']
if nb['nbformat'] != 4:
return

if isinstance(contents_manager, RmdFileContentsManager):
formats = contents_manager.default_nbrmd_formats
else:
formats = ['.ipynb']

formats = nb.get('metadata', {}).get('nbrmd_formats', formats)

if not isinstance(formats, list) or not set(formats).issubset(
['.Rmd', '.md', '.ipynb']):
raise TypeError(u"Notebook metadata 'nbrmd_formats' "
u"should be subset of ['.Rmd', '.md', '.ipynb']")

os_path = contents_manager._get_os_path(path) if contents_manager else path
file, ext = os.path.splitext(path)
os_file, ext = os.path.splitext(os_path)

for alt_ext in formats:
if ext != alt_ext:
if contents_manager:
contents_manager.log.info(
u"Saving file at /%s", file + alt_ext)
nbrmd.writef(nbformat.notebooknode.from_dict(nb),
os_file + alt_ext)


def _nbrmd_writes(nb, version=nbformat.NO_CONVERT, **kwargs):
return nbrmd.writes(nb, **kwargs)
Expand All @@ -33,26 +80,83 @@ class RmdFileContentsManager(FileContentsManager):
or in plain Markdown format (.md)
"""
nb_extensions = ['.ipynb', '.Rmd', '.md']
default_nbrmd_formats = ['.ipynb']
default_nbrmd_sourceonly_format = None

def __init__(self, **kwargs):
self.pre_save_hook = update_selected_formats
self.pre_save_hook = update_alternative_formats
super(RmdFileContentsManager, self).__init__(**kwargs)

def _read_notebook(self, os_path, as_version=4):
def _read_notebook(self, os_path, as_version=4,
load_alternative_format=True):
"""Read a notebook from an os path."""
file, ext = os.path.splitext(os_path)
if ext == '.Rmd':
with mock.patch('nbformat.reads', _nbrmd_reads):
return super(RmdFileContentsManager, self) \
nb = super(RmdFileContentsManager, self) \
._read_notebook(os_path, as_version)
elif ext == '.md':
with mock.patch('nbformat.reads', _nbrmd_md_reads):
return super(RmdFileContentsManager, self) \
nb = super(RmdFileContentsManager, self) \
._read_notebook(os_path, as_version)
else:
return super(RmdFileContentsManager, self) \
else: # ext == '.ipynb':
nb = super(RmdFileContentsManager, self) \
._read_notebook(os_path, as_version)

if not load_alternative_format:
return nb

# Notebook formats: default, notebook metadata, or current extension
nbrmd_formats = nb.metadata.get('nbrmd_formats') or \
self.default_nbrmd_formats

if ext not in nbrmd_formats:
nbrmd_formats.append(ext)

# Source format is taken in metadata, contentsmanager, or is current
# ext, or is first non .ipynb format that is found on disk
source_format = nb.metadata.get('nbrmd_sourceonly_format') or \
self.default_nbrmd_sourceonly_format

if source_format is None:
if ext != '.ipynb':
source_format = ext
else:
for fmt in nbrmd_formats:
if fmt != '.ipynb' and os.path.isfile(file + fmt):
source_format = fmt
break

nb_outputs = None
if source_format is not None and ext != source_format:
self.log.info('Reading source from {} and outputs from {}' \
.format(file + source_format, os_path))
nb_outputs = nb
nb = self._read_notebook(file + source_format,
as_version=as_version,
load_alternative_format=False)
elif ext != '.ipynb' and '.ipynb' in nbrmd_formats \
and os.path.isfile(file + '.ipynb'):
self.log.info('Reading source from {} and outputs from {}' \
.format(os_path, file + '.ipynb'))
nb_outputs = self._read_notebook(file + '.ipynb',
as_version=as_version,
load_alternative_format=False)

# We store in the metadata the alternative and sourceonly formats
trusted = self.notary.check_signature(nb)
nb.metadata['nbrmd_formats'] = nbrmd_formats
nb.metadata['nbrmd_sourceonly_format'] = source_format

if nb_outputs is not None:
combine.combine_inputs_with_outputs(nb, nb_outputs)
trusted = self.notary.check_signature(nb_outputs)

if trusted:
self.notary.sign(nb)

return nb

def _save_notebook(self, os_path, nb):
"""Save a notebook to an os_path."""
file, ext = os.path.splitext(os_path)
Expand Down Expand Up @@ -96,27 +200,15 @@ def get(self, path, content=True, type=None, format=None):
(type == 'notebook' or
(type is None and
any([path.endswith(ext) for ext in self.nb_extensions]))):
nb = self._notebook_model(path, content=content)

# Read outputs from .ipynb version if available
if content and not path.endswith('.ipynb'):
file, ext = os.path.splitext(path)
path_ipynb = file + '.ipynb'
if self.exists(path_ipynb):
try:
nb_outputs = self._notebook_model(
path_ipynb, content=content)
combine_inputs_with_outputs(nb['content'],
nb_outputs['content'])
except HTTPError:
pass

return nb

return self._notebook_model(path, content=content)
else:
return super(RmdFileContentsManager, self) \
.get(path, content, type, format)

def trust_notebook(self, path):
file, ext = os.path.splitext(path)
super(RmdFileContentsManager, self).trust_notebook(file + '.ipynb')

def rename_file(self, old_path, new_path):
old_file, org_ext = os.path.splitext(old_path)
new_file, new_ext = os.path.splitext(new_path)
Expand Down
Loading

0 comments on commit de57bfc

Please sign in to comment.