Skip to content

Commit

Permalink
Add initial implementation, documentation, tests
Browse files Browse the repository at this point in the history
  • Loading branch information
breathe committed Nov 20, 2018
1 parent b3c915a commit 996acab
Show file tree
Hide file tree
Showing 17 changed files with 535 additions and 59 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.noseids
.vscode
.ipynb_checkpoints

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
4 changes: 2 additions & 2 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ confidence=
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use"--disable=all --enable=classes
# --disable=W"
disable=suppressed-message,dict-iter-method,getslice-method,range-builtin-not-iterating,useless-suppression,input-builtin,indexing-exception,reload-builtin,import-star-module-level,nonzero-method,print-statement,cmp-builtin,oct-method,metaclass-assignment,xrange-builtin,long-builtin,old-octal-literal,coerce-method,raising-string,basestring-builtin,old-division,old-ne-operator,round-builtin,old-raise-syntax,coerce-builtin,execfile-builtin,dict-view-method,raw_input-builtin,unichr-builtin,no-absolute-import,using-cmp-argument,hex-method,unicode-builtin,next-method-called,delslice-method,unpacking-in-except,standarderror-builtin,cmp-method,intern-builtin,backtick,reduce-builtin,map-builtin-not-iterating,apply-builtin,buffer-builtin,file-builtin,zip-builtin-not-iterating,filter-builtin-not-iterating,long-suffix,parameter-unpacking,setslice-method
disable=suppressed-message,dict-iter-method,getslice-method,range-builtin-not-iterating,useless-suppression,input-builtin,indexing-exception,reload-builtin,import-star-module-level,nonzero-method,print-statement,cmp-builtin,oct-method,metaclass-assignment,xrange-builtin,long-builtin,old-octal-literal,coerce-method,raising-string,basestring-builtin,old-division,old-ne-operator,round-builtin,old-raise-syntax,coerce-builtin,execfile-builtin,dict-view-method,raw_input-builtin,unichr-builtin,no-absolute-import,using-cmp-argument,hex-method,unicode-builtin,next-method-called,delslice-method,unpacking-in-except,standarderror-builtin,cmp-method,intern-builtin,backtick,reduce-builtin,map-builtin-not-iterating,apply-builtin,buffer-builtin,file-builtin,zip-builtin-not-iterating,filter-builtin-not-iterating,long-suffix,parameter-unpacking,setslice-method,C0111,W0122,C0103,E1101,W0703,W0231,R0201


[REPORTS]
Expand Down Expand Up @@ -109,7 +109,7 @@ spelling-store-unknown-words=no
[FORMAT]

# Maximum number of characters on a single line.
max-line-length=79
max-line-length=160

# Regexp for a line that is allowed to be longer than the limit.
ignore-long-lines=^\s*(# )?<?https?://\S+>?$
Expand Down
71 changes: 71 additions & 0 deletions Example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cell metadata can be used to identify cells that should not be executed. This allows one to easily extract parameters from the notebook which can be provided externally to parameterize the notebook behavior when used from an external script.\n",
"\n",
"Use `View -> Cell Toolbar -> Edit Metadata` to add the Edit Metadata button to every cell.\n",
"\n",
"Then edit a cell and add \"NotebookScripter\": \"skip_cell\" to the cell metadata for a cell to skip the execution of that cell when called from external code.\n",
"\n",
"The general pattern is:\n",
"- define 'parameter definition' cell[s] with the skip_cell metadata\n",
"- put whatever values you want to supply for the parameters in that cell. Those values will be used when editing in the notebook environment\n",
"- provide alternative values for those parameters when invoking run_notebook from calling code\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"a_useful_mode_switch = None"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"some_useful_value = \"You can access this variable on the module object returned from run_notebook\"\n",
"\n",
"def hello(arg):\n",
" \"\"\"Call this function from the run_notebook return object if you want\"\"\"\n",
" print(\"Hello {0}\".format(arg))\n",
" \n",
"# module scope is a fine place for running side-effects.\n",
"# These will be evaluated everytime run_notebook is called\n",
"if a_useful_mode_switch == \"idiot_mode\":\n",
" hello(\"Flat Earthers!\")\n",
"elif a_useful_mode_switch == \"non_idiot_mode\":\n",
" hello(\"World!\")"
]
}
],
"metadata": {
"celltoolbar": "Edit Metadata",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
1 change: 1 addition & 0 deletions NotebookScripter/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .main import run_notebook, run_notebook_in_process
120 changes: 120 additions & 0 deletions NotebookScripter/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@

import types
import io
import typing

from IPython import get_ipython
from IPython.core.interactiveshell import InteractiveShell
from IPython.core.magic import Magics, magics_class, line_magic

from nbformat import read


def run_notebook(
path_to_notebook: str,
initial_values_for_ns: typing.Dict = None,
with_backend='agg'
) -> typing.Any:
"""Run a notebook as a module within this processes namespace"""

shell = InteractiveShell.instance()

if with_backend:
try:
# try to initialize the matplotlib backend as early as possible
# (cuts down on potential for complex bugs)
import matplotlib
matplotlib.use(with_backend, force=True)
except ModuleNotFoundError:
# don't error out here when matplotlib is missing -- instead there will be
# a failure within the notebook if notebook actually tries to use
# matplotlib ...
pass

@magics_class
class NotebookScripterMagics(Magics):
@line_magic
def matplotlib(self, _line):
"Override matplotlib magic to use non-interactive backend regardless of user supplied argument ..."
import matplotlib
matplotlib.use(with_backend, force=True)

shell.register_magics(NotebookScripterMagics)

# load the notebook object
with io.open(path_to_notebook, 'r', encoding='utf-8') as f:
notebook = read(f, 4)

# create new module scope for notebook execution
module_identity = "loaded_notebook"
dynamic_module = types.ModuleType(module_identity)
dynamic_module.__file__ = path_to_notebook
dynamic_module.__dict__['get_ipython'] = get_ipython

# do some extra work to ensure that magics that would affect the user_ns
# actually affect the notebook module's ns
save_user_ns = shell.user_ns
shell.user_ns = dynamic_module.__dict__

# inject provided values into the module namespace prior to running any cells
dynamic_module.__dict__.update(initial_values_for_ns or {})

try:
for cell in notebook.cells:
# loop over the code cells
if cell.cell_type == 'code':
# skip cells which contain 'skip_cell_when_run_as_script' metadata
if 'metadata' in cell and 'NotebookScripter' in cell.metadata and cell.metadata['NotebookScripter'] == "skip_cell":
# print("Skipping cell {0}!".format(i))
continue
else:
# transform the input to executable Python
code = shell.input_transformer_manager.transform_cell(
cell.source)
# run the code in the module
exec(code, dynamic_module.__dict__)
except Exception as err:
raise err
finally:
shell.user_ns = save_user_ns
return dynamic_module


def worker(queue, path_to_notebook, initial_values_for_ns, with_backend, return_values):
dynamic_module = run_notebook(path_to_notebook, initial_values_for_ns=initial_values_for_ns, with_backend=with_backend)

if return_values:
ret = {k: simple_serialize(dynamic_module.__dict__[k]) for k in return_values if k in dynamic_module.__dict__}
queue.put(ret)


def simple_serialize(obj):
import pickle
try:
pickle.dumps(obj)
# if we didn't raise, then (theoretically) obj should be serializable ...
return obj
except Exception:
return repr(obj)


def run_notebook_in_process(
path_to_notebook: str,
initial_values_for_ns: typing.Dict = None,
marshal_values=None,
with_backend='agg'
) -> None:
import multiprocessing as mp

queue = mp.Queue()

p = mp.Process(target=worker, args=(queue, path_to_notebook, initial_values_for_ns, with_backend, marshal_values))
p.start()

if not marshal_values:
p.join()
return {}

final_namespace = queue.get()
p.join()
return final_namespace
File renamed without changes.
14 changes: 14 additions & 0 deletions NotebookScripter/snapshots/snap_TestNotebookScripter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-
# snapshottest: v1 - https://goo.gl/zC4yUc
from __future__ import unicode_literals

from snapshottest import Snapshot


snapshots = Snapshot()

snapshots['TestNotebookExecution::test_run_notebook 1'] = 'Hello state1'

snapshots['TestNotebookExecution::test_run_notebook_in_process 1'] = {
'stateful_name': 'state1'
}
154 changes: 154 additions & 0 deletions NotebookScripter/tests/Test.ipynb

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions NotebookScripter/tests/TestNotebookScripter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import os
import snapshottest

import NotebookScripter


class TestNotebookExecution(snapshottest.TestCase):
def setUp(self):
pass

def test_run_notebook(self):
notebook_file = os.path.join(os.path.dirname(__file__), "./Test.ipynb")
mod = NotebookScripter.run_notebook(notebook_file, with_backend='agg')
value = mod.hello()
print(value)
self.assertMatchSnapshot(value)

def test_run_notebook_in_process(self):
notebook_file = os.path.join(os.path.dirname(__file__), "./Test.ipynb")
values = NotebookScripter.run_notebook_in_process(notebook_file, marshal_values=["stateful_name", "asdf"], with_backend='agg')
print(values)
self.assertMatchSnapshot(values)

def test_run_with_backend_is_used(self):
notebook_file = os.path.join(os.path.dirname(__file__), "./Test.ipynb")

with self.assertRaises(Exception) as context:
NotebookScripter.run_notebook(notebook_file, with_backend="somefake")
self.assertTrue("Unrecognized backend string 'somefake'" in str(context.exception))
Empty file.
14 changes: 14 additions & 0 deletions NotebookScripter/tests/snapshots/snap_TestNotebookScripter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-
# snapshottest: v1 - https://goo.gl/zC4yUc
from __future__ import unicode_literals

from snapshottest import Snapshot


snapshots = Snapshot()

snapshots['TestNotebookExecution::test_run_notebook 1'] = 'Hello state1'

snapshots['TestNotebookExecution::test_run_notebook_in_process 1'] = {
'stateful_name': 'state1'
}
117 changes: 117 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Notebookscripter [![Version](https://img.shields.io/pypi/v/NotebookScripter.svg)](https://pypi.python.org/pypi/NotebookScripter) [![Build](https://travis-ci.org/breathe/NotebookScripter.svg?branch=master)](https://travis-ci.org/breathe/NotebookScripter) [![Coverage](https://img.shields.io/coveralls/breathe/NotebookScripter.svg)](https://coveralls.io/r/breathe/NotebookScripter) [![Health](https://codeclimate.com/github/breathe/NotebookScripter/badges/gpa.svg)](https://codeclimate.com/github/breathe/NotebookScripter)

[![Compatibility](https://img.shields.io/pypi/pyversions/NotebookScripter.svg)](https://pypi.python.org/pypi/NotebookScripter)
[![Implementations](https://img.shields.io/pypi/implementation/NotebookScripter.svg)](https://pypi.python.org/pypi/NotebookScripter)
[![Format](https://img.shields.io/pypi/format/NotebookScripter.svg)](https://pypi.python.org/pypi/NotebookScripter)
[![Downloads](https://img.shields.io/pypi/dm/NotebookScripter.svg)](https://pypi.python.org/pypi/NotebookScripter)

This package exposes ipython jupyter notebooks as callable functions.

The goal is to provide a simple way to reuse code developed/maintained in a notebook environment by turning notebooks into callable python functions with parameters optionally supplied as arguments to the function call.

Unlike a tool like nbconvert, this module allows one to continue using the notebook as an interactive development environment. With nbconvert one does a one-time conversion of a notebook into a .py file, afterwards any changes you make to that .py file are no longer usable in a notebook context. Additionally with nbconvert there is no reasonable way to directly re-use 'work flows' defined as sequences of instructions on the module scope as one would typically do in a notebook when developing a complicated process.

With this module, you can keep code in unmodified notebooks, continue to develop/interact with that code within notebooks, and easily trigger that notebook code from external python programs or scripting contexts.

Usage:

## Execute a notebook as a function call

Suppose you have this notebook: [./Example.ipynb](./Example.ipynb)

```python
from NotebookScripter import run_notebook

some_module = run_notebook("./Example.ipynb")
```

The call to `run_notebook()`:

1. creates an anonymous python module
1. execs all the code cell's within `Example.ipynb` sequentially in the context of that module
1. returns the module after all the cells have executed.

Any values or functions defined in the module scope within the notebook can be subsequently accessed:

```python
print(some_module.some_useful_value)
some_module.hello()
```

This execution model matches the mental model that a developer has when working within the notebook. Importantly - the notebook code is not being imported as a python module - rather, all the code within the notebook is re-run on each call to run_notebook() just as a developer would expect when working interactively in the notebook.

If desired, initial values can be injected into the namespace of the module. These values are injected into the created module namespace prior to executing any of the notebook cell's.

```python
another_module = run_notebook("./Example.ipynb", {
"a_useful_mode_switch": "idiot_mode"
})
```

In this case -- the value of `a_useful_mode_switch` selects idiot mode and the notebook prints: `Hello Flat Earthers`. But how -- if the notebook is still useable interactively, then it must mean that `some_useful_parameter` needs to be defined prior to being used and this would make our externally supplied value useless (as it would be re-defined within the notebook prior to having any useful effect). run_notebook supplies a simple convention to allow identifying which parameters of the notebook are intended to be supplied by an external caller. The convention is that `run_notebook` will only execute cells that DO_NOT contain NotebookScripter metadata value like the following:

```json
"NotebookScripter": "skip_cell"
```

In `Example.ipynb` This annotation is added to the cell defining the `a_useful_mode_switch` variable.

This annotation can be added to any cell's which you do _NOT_ want to run when the notebook is executed by NotebookScripter. The pattern for turning notebook's into parameterizable workflows:

1. create a cell and define default values for any parameters you want the caller to supply
2. annotate that cell with 'skip_cell' metadata.

When run interactively in the notebook, the values defined in that cell will be used for those parameters. When called externally, the caller should supply all the required values via the second argument to run_notebook.

## Dealing with matplotlib

run_notebook supports a third argument `with_backend` which defaults to 'agg'. run_notebook intercepts any usage of `%matplotlib` ipython line magic and replaces the argument with the value supplied by this parameter. For example

```python
%matplotlib inline

import matplotlib.pyplot as plt
# ...<some script that also produces plots>...
```

When executed via run_notebook(..., with_backend='agg') - the line `%matplotlib inline` will instead be interpreted as `%matplotlib agg`.

This functionality allows 'interactive' plotting backend selection in the notebook environment and 'non-interactive' backend selection in the scripting context. 'agg' is a non-interactive backend built into most distributions of matplotlib. To disable this functionality provide `with_backend=None`.

## Execute a notebook in isolated subprocess

run_notebook runs notebook's within the same process as the caller. Sometimes more isolation between notebook executions is desired or requried. NotebookScripter provides a run_notebook_in_process function for this case:

```python
from NotebookScripter import run_notebook_in_process

# run notebook in subprocess
run_notebook_in_process("./example.ipynb", {"some_useful_paramer": "any_json_serializable_value"})
```

Unlike `run_notebook` `run_notebook_in_process` cannot return the module as it is not transferrable across process boundaries. Its still possible to retrieve serializable state from the notebook though. Return values can be retrieved by passing the 'marshal_values' parameter. After executing the notebook, any variables in the module scope with these names will be serialized, transferred from the subprocess back to the calling process, deserialized and then returned as a python dictionary. All requested values must be pickle serializable (otherwise, their repr() will be returned).

```python
serialized_module_namespace = run_notebook_in_process("./example.ipynb",
{'some_parameter': "any_json_serializable_value"},
marshal_values: ["some_key_into_module_namespace_of_serializable_value"]
)
```

Installation:

```bash
> pip install NotebookScripter
```

## Changelog

### 1.0.1

- Added documentation and initial implementation.
- Added package build/release automation.
- Added simple tests.

### 1.0.0

- Initial build
Loading

0 comments on commit 996acab

Please sign in to comment.