This package exposes ipython jupyter notebooks as callable functions which can be used in python programs or scripts and parameterized externally.
The goal is to provide a simple way to reuse code developed/maintained in a notebook environment.
> pip install NotebookScripter
Suppose you have this notebook: ./Example.ipynb. You can use the NotebookScripter.run_notebook method to execute it.
>>> from NotebookScripter import run_notebook
>>> some_module = run_notebook("./Example.ipynb")
>>>
The call to run_notebook()
:
- creates an anonymous python module
- execs all the code cell's within
Example.ipynb
sequentially in the context of that module - returns the module after all the cells have executed.
Values or functions defined in the module scope within the notebook can be subsequently accessed:
>>> print(some_module.some_useful_value)
You can access this variable on the module object returned from run_notebook
>>> some_module.hello("world")
Hello world
>>>
The run_notebook execution model matches the mental model that a developer has when working within the notebook. Importantly - the notebook code is not being imported as a python module - rather, all the code within the notebook is re-run on each call to run_notebook() just as a developer would expect when working interactively in the notebook.
If desired, values can be injected into the namespace of the notebook during notebook execution. The notebook author can annotate cells via cell metadata (built into jupyter). The caller then can supply parameters which will be injected into the module namespace after a cell with the matching hook is executed.
>>> another_module = run_notebook('./Example.ipynb', mode={
... "a_useful_mode_switch": "idiot_mode"
... })
Hello Flat Earthers!
>>>
In this case -- the mode
keyword parameter (that name is not chosen by NotebookScripter but rather by the cell metadata and can be anything) is defined on the cell which defines the a_useful_mode_switch
module variable within the notebook. After that cell executes, all the values passed to run_notebook via the keyword argument with the name matching the hook are injected into the module -- thus after the cell executes, the value of the module variable a_useful_mode_switch
will be "idiot_mode". Which later selects idiot mode and causes the notebook to print: Hello Flat Earthers
.
run_notebook
supports an argument with_backend
which defaults to 'agg'. run_notebook
intercepts any usage of %matplotlib
ipython line magic within the notebook and replaces the argument with the value supplied by this parameter. For example -- suppose you had a jupyter cell with contents like the following:
%matplotlib inline
import matplotlib.pyplot as plt
# ...<some script that also produces plots>...
When executed via run_notebook(..., with_backend='agg') - the line %matplotlib inline
will instead be interpreted like %matplotlib agg
.
This functionality allows 'interactive' plotting backend selection in the notebook environment and 'non-interactive' backend selection in the scripting context. 'agg' is a non-interactive backend built into most distributions of matplotlib. To disable this functionality provide with_backend=None
.
run_notebook
executes notebook's within the same process as the caller. Sometimes more isolation between notebook executions is desired or required. NotebookScripter provides a run_notebook_in_process function for this case:
>>> from NotebookScripter import run_notebook_in_process
# run notebook in subprocess -- note there is no output in doctest as output occurs in subprocess
>>> module = run_notebook_in_process("./Example.ipynb", mode={"a_useful_mode_switch": "idiot_mode"})
>>>
Unlike run_notebook
, run_notebook_in_process
cannot return the module as Python modules are not transferrable across process boundaries. It's still possible to retrieve serializable state from the notebook though. Return values can be retrieved by passing the 'return_values' parameter. After executing the notebook, any variables on the module scope with these names will be serialized, transferred from the subprocess back to the calling process, deserialized, and an anonymous module with those parameters will be returned to the caller. All requested values must be pickle serializable (otherwise, their repr() will be returned).
>>> module = run_notebook_in_process("./Example.ipynb", mode={"a_useful_mode_switch": "non_idiot_mode"}, return_values=["some_useful_value"])
>>> print(module.some_useful_value)
You can access this variable on the module object returned from run_notebook
>>>
A friend of mine was working on a complex analysis for her PhD thesis in an ipython jupyter notebook. She was reasonably familiar with the jupyter workflow -- which by design, tends to force you into defining parameters/state as module globals where they can be easily accessed from subsequent cells. She organized her notebook nicely, with plots and various forms of sanity checking for a complicated, hairy, and time-consuming chain of computations. Sometime near when she was finished designing the analysis, she realized she would need to run this notebook a few hundred times with different values for the parameters which she had discovered controlled the dynamics for her problem. I'm fond of typed languages and expected this would be relatively easy to refactor so I leaned in to help when I heard her groan. I quickly realized -- in fact -- no, this refactor would not be so simple.
- Python is extremely difficult to refactor - even relatively simple mechanical transformations are essentially rewrites in terms of required debugging time
- Code written in the workflow of jupyter notebooks tends to be even harder still to reorganize. The notebook workflow encourages you to define variables and parameters on the module scope so that you have easy access to these values from other cells. In fact -- one of the reasons notebook's are convenient to use is precisely this implicit sequential chaining. The code in the notebook is linear -- things defined in later cells depend on anything defined before which often makes extracting arbitrary parameters into normal code reuse units like functions a pretty major change to the logic of the notebook.
- Normal code reuse abstractions like functions often make it harder to read, reason about, and deal with your notebooks interactively. In many cases, its much simpler to write as much of your process linearly as you can -- with all the parameters in scope given values describing a single instance of the problem so that you can inspect and edit interactively at any relevant point in the process rather than hiding code inside function scopes which cannot be so readily inspected/modified interactively
- Extracting the code from the notebook and turning it into a form that can be parameterized loses the simplicity of the process description -- if she discovers when running this process with the hundreds of variations required, that some specific case needs more analysis -- its not possible to go back from the refactored code back to the simpler version that she can work with interactively to delve into the complex problem specific details.
Refactoring her code to run all the problem instances for her analysis within the notebook would be error prone, make the notebook harder to interact with, and make the notebook harder to read (for a reader trying to understand the process). The benefits and conveniences of notebooks (shareability, ease of interacting with intermediate computation states) are lessened after the restructuring needed to extract a parameterizable function from a notebook intended to describe a single complicated process.
Unlike a tool like nbconvert, this module allows one to continue using the notebook as an interactive development environment without change. With nbconvert one does a one-time conversion of a notebook into a .py file, afterwards any changes you make to that .py file are no longer usable in a notebook context. Additionally with nbconvert there is no reasonable way to directly extract re-runnable 'work flows' from the typical sequences of instructions one would interactively define on the notebook module scope.
With this module, you can keep code in unmodified notebooks to handle specific instances of problems, continue to develop/interact with that code within notebooks, and easily trigger that notebook code from external python programs or scripting contexts with differerent parameters if needed.
- Update Cell Metadata api and documentation
- Add doctests and integrate with tests
- Fix code coverage and code health CI integration
- Fix buggy behavior that occured when running from python interactive shell associated with ipython embedding api behavior
- Added documentation and initial implementation.
- Added package build/release automation.
- Added simple tests.
- Initial build