Name	Name	Last commit message	Last commit date
Latest commit History 13 Commits
NotebookScripter	NotebookScripter
tests	tests
.codeclimate.yml	.codeclimate.yml
.dockerignore	.dockerignore
.gitignore	.gitignore
.pylintrc	.pylintrc
.travis.yml	.travis.yml
Dockerfile	Dockerfile
Example.ipynb	Example.ipynb
LICENSE	LICENSE
README.md	README.md
docker-compose.yml	docker-compose.yml
requirements.txt	requirements.txt
requirements_static_analysis.txt	requirements_static_analysis.txt
requirements_test.txt	requirements_test.txt
requirements_test_runner.txt	requirements_test_runner.txt
setup.cfg	setup.cfg
setup.py	setup.py
tests.py	tests.py
tox.ini	tox.ini

Notebookscripter

This package exposes ipython jupyter notebooks as callable functions which can be used in python programs or scripts and parameterized externally.

The goal is to provide a simple way to reuse code developed/maintained in a notebook environment.

Installation

> pip install NotebookScripter

How to use

Suppose you have this notebook: ./Example.ipynb. You can use the NotebookScripter.run_notebook method to execute it.

>>> from NotebookScripter import run_notebook
>>> some_module = run_notebook("./Example.ipynb")
>>>

The call to run_notebook():

creates an anonymous python module
execs all the code cell's within Example.ipynb sequentially in the context of that module
returns the module after all the cells have executed.

Values or functions defined in the module scope within the notebook can be subsequently accessed:

>>> print(some_module.some_useful_value)
You can access this variable on the module object returned from run_notebook
>>> some_module.hello("world")
Hello world
>>>

The run_notebook execution model matches the mental model that a developer has when working within the notebook. Importantly - the notebook code is not being imported as a python module - rather, all the code within the notebook is re-run on each call to run_notebook() just as a developer would expect when working interactively in the notebook.

If desired, values can be injected into the namespace of the notebook during notebook execution. The notebook author can annotate cells via cell metadata (built into jupyter). The caller then can supply parameters which will be injected into the module namespace after a cell with the matching hook is executed.

>>> another_module = run_notebook('./Example.ipynb', mode={
...   "a_useful_mode_switch": "idiot_mode"
... })
Hello Flat Earthers!
>>>

In this case -- the mode keyword parameter (that name is not chosen by NotebookScripter but rather by the cell metadata and can be anything) is defined on the cell which defines the a_useful_mode_switch module variable within the notebook. After that cell executes, all the values passed to run_notebook via the keyword argument with the name matching the hook are injected into the module -- thus after the cell executes, the value of the module variable a_useful_mode_switch will be "idiot_mode". Which later selects idiot mode and causes the notebook to print: Hello Flat Earthers.

Dealing with matplotlib

run_notebook supports an argument with_backend which defaults to 'agg'. run_notebook intercepts any usage of %matplotlib ipython line magic within the notebook and replaces the argument with the value supplied by this parameter. For example -- suppose you had a jupyter cell with contents like the following:

%matplotlib inline

import matplotlib.pyplot as plt
# ...<some script that also produces plots>...

When executed via run_notebook(..., with_backend='agg') - the line %matplotlib inline will instead be interpreted like %matplotlib agg.

This functionality allows 'interactive' plotting backend selection in the notebook environment and 'non-interactive' backend selection in the scripting context. 'agg' is a non-interactive backend built into most distributions of matplotlib. To disable this functionality provide with_backend=None.

Execute a notebook in isolated subprocess

run_notebook executes notebook's within the same process as the caller. Sometimes more isolation between notebook executions is desired or required. NotebookScripter provides a run_notebook_in_process function for this case:

>>> from NotebookScripter import run_notebook_in_process

# run notebook in subprocess -- note there is no output in doctest as output occurs in subprocess
>>> module = run_notebook_in_process("./Example.ipynb", mode={"a_useful_mode_switch": "idiot_mode"})
>>>

Unlike run_notebook, run_notebook_in_process cannot return the module as Python modules are not transferrable across process boundaries. It's still possible to retrieve serializable state from the notebook though. Return values can be retrieved by passing the 'return_values' parameter. After executing the notebook, any variables on the module scope with these names will be serialized, transferred from the subprocess back to the calling process, deserialized, and an anonymous module with those parameters will be returned to the caller. All requested values must be pickle serializable (otherwise, their repr() will be returned).

>>> module = run_notebook_in_process("./Example.ipynb", mode={"a_useful_mode_switch": "non_idiot_mode"}, return_values=["some_useful_value"])
>>> print(module.some_useful_value)
You can access this variable on the module object returned from run_notebook
>>>

Why

A friend of mine was working on a complex analysis for her PhD thesis in an ipython jupyter notebook. She was reasonably familiar with the jupyter workflow -- which by design, tends to force you into defining parameters/state as module globals where they can be easily accessed from subsequent cells. She organized her notebook nicely, with plots and various forms of sanity checking for a complicated, hairy, and time-consuming chain of computations. Sometime near when she was finished designing the analysis, she realized she would need to run this notebook a few hundred times with different values for the parameters which she had discovered controlled the dynamics for her problem. I'm fond of typed languages and expected this would be relatively easy to refactor so I leaned in to help when I heard her groan. I quickly realized -- in fact -- no, this refactor would not be so simple.

Python is extremely difficult to refactor - even relatively simple mechanical transformations are essentially rewrites in terms of required debugging time
Code written in the workflow of jupyter notebooks tends to be even harder still to reorganize. The notebook workflow encourages you to define variables and parameters on the module scope so that you have easy access to these values from other cells. In fact -- one of the reasons notebook's are convenient to use is precisely this implicit sequential chaining. The code in the notebook is linear -- things defined in later cells depend on anything defined before which often makes extracting arbitrary parameters into normal code reuse units like functions a pretty major change to the logic of the notebook.
Normal code reuse abstractions like functions often make it harder to read, reason about, and deal with your notebooks interactively. In many cases, its much simpler to write as much of your process linearly as you can -- with all the parameters in scope given values describing a single instance of the problem so that you can inspect and edit interactively at any relevant point in the process rather than hiding code inside function scopes which cannot be so readily inspected/modified interactively
Extracting the code from the notebook and turning it into a form that can be parameterized loses the simplicity of the process description -- if she discovers when running this process with the hundreds of variations required, that some specific case needs more analysis -- its not possible to go back from the refactored code back to the simpler version that she can work with interactively to delve into the complex problem specific details.

Refactoring her code to run all the problem instances for her analysis within the notebook would be error prone, make the notebook harder to interact with, and make the notebook harder to read (for a reader trying to understand the process). The benefits and conveniences of notebooks (shareability, ease of interacting with intermediate computation states) are lessened after the restructuring needed to extract a parameterizable function from a notebook intended to describe a single complicated process.

Comparison to other methods

Unlike a tool like nbconvert, this module allows one to continue using the notebook as an interactive development environment without change. With nbconvert one does a one-time conversion of a notebook into a .py file, afterwards any changes you make to that .py file are no longer usable in a notebook context. Additionally with nbconvert there is no reasonable way to directly extract re-runnable 'work flows' from the typical sequences of instructions one would interactively define on the notebook module scope.

With this module, you can keep code in unmodified notebooks to handle specific instances of problems, continue to develop/interact with that code within notebooks, and easily trigger that notebook code from external python programs or scripting contexts with differerent parameters if needed.

Changelog

2.0.0

Update Cell Metadata api and documentation
Add doctests and integrate with tests
Fix code coverage and code health CI integration

1.0.5

Fix buggy behavior that occured when running from python interactive shell associated with ipython embedding api behavior

1.0.1

Added documentation and initial implementation.
Added package build/release automation.
Added simple tests.

1.0.0

Initial build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notebookscripter

Installation

How to use

Dealing with matplotlib

Execute a notebook in isolated subprocess

Why

Comparison to other methods

Changelog

2.0.0

1.0.5

1.0.1

1.0.0

About

Releases

Packages

Contributors 2

Languages

License

breathe/NotebookScripter

Folders and files

Latest commit

History

Repository files navigation

Notebookscripter

Installation

How to use

Dealing with matplotlib

Execute a notebook in isolated subprocess

Why

Comparison to other methods

Changelog

2.0.0

1.0.5

1.0.1

1.0.0

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages