Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,16 @@ on: [workflow_dispatch, pull_request, push]
jobs:
test:
runs-on: ubuntu-latest
steps: [uses: fastai/workflows/nbdev-ci@master]
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- name: Install Dependencies
shell: bash
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
conda install graphviz
- name: Test
shell: bash
run: |
pytest tests
121 changes: 118 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,70 @@ nbmodular

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

Convert data scientist notebooks with poor modularity to fully modular
Convert data science notebooks with poor modularity to fully modular
notebooks and / or python modules.

## Motivation

In data science, it is usual to develop experimentally and quickly based
on notebooks, with little regard to software engineering practices and
modularity. It can become challenging to start working on someone else’s
notebooks with no modularity in terms of separate functions, and a great
degree of duplicated code between the different notebooks. This makes it
difficult to understand the logic in terms of semantically separate
units, see what are the commonalities and differences between the
notebooks, and be able to extend, generalize, and configure the current
solution.

## Objectives

`nbmodular` is a library conceived with the objective of helping
converting the cells of a notebook into separate functions with clear
dependencies in terms of inputs and outputs. This is done though a
combination of tools which semi-automatically understand the data-flow
in the code, based on mild assumptions about its structure. It also
helps test the current logic and compare it against a modularized
solution, to make sure that the refactored code is equivalent to the
original one.

## Features

- Convert cells to functions.
- The logic of a single function can be written across multiple cells.
- Optional: processed cells can continue to operate as cells or be only
used as functions from the moment they are converted.
- Create an additional pipeline function that provides the data-flow
from the first to the last function call in the notebook.
- Write all the notebook functions to a separate python module.
- Compare the result of the pipeline with the result of running the
original notebook.
- Converted functions act as nodes in a dependency graph. These nodes
can optionally hold the values of local variables for inspection
outside of the function. This is similar to having a single global
scope, which is the original situation. Since this is
memory-consuming, it is optional and may not be the default.
- Optional: Once we are able to construct a graph, we may be able to
draw it or show it in text, and pass it to ADG processors that can run
functions sequentially or in parallel.
- Persist the inputs and outputs of functions, so that we may decide to
reuse previous results without running the whole notebook.
- Optional: if we have the dependency graph and persisted inputs /
outputs, we may decide to only run those cells that are predecessors
of the current one, i.e., the ones that provide the inputs needed by
the current cell.
- Optional: if we associate a hash code to input data, we may only run
the cells when the input data changes. Similarly, if we associate a
hash code with AST-converted function code, we may only run those
cells whose code has been updated.
- Optional: have a mechanism for indicating test examples that go into
different test python files. = Optional: the output of a test cell can
be used for assertions, where we require that the current output is
the same as the original one.

## Roadmap

- Convert cell code into functions:
- Inputs are those variables detected in current cell and also
- [ ] Convert cell code into functions:
- [x] Inputs are those variables detected in current cell and also
detected in previous cells. This solution requires that created
variables have unique names across the notebook. However, even if a
new variable with the same name is defined inside the cell, the
Expand Down Expand Up @@ -42,3 +99,61 @@ notebooks and / or python modules.
``` sh
pip install nbmodular
```

## Usage

Load ipython extension

Use ipython magic `function` by passing it the name of the function you
want:

``` python
a = 2
b = 3
c = a+b
print (a+b)
```

5

This defines the function `print_values`, as follows:

``` python
```

def print_values():
a = 2
b = 3
c = a+b
print (a+b)
return a,b,c

Now, we can define another function in a cell that uses variables from
the previous function.

``` python
d = 10
```

``` python
a = a + d
b = b + d
c = c + d
print (a, b, c, d)
```

12 13 15 10

``` python
```

def add_new_values(idx, original_code, name, values_before, call, values_here, variables_here, variables_before, variables_after, arguments, return_values, code):
a = a + d
b = b + d
c = c + d
print (a, b, c, d)
return d

By default, we can use variables from the previous cell as we normally
do, i.e., values are still global. However, we can also opt to run the
code encapsulated in
2 changes: 1 addition & 1 deletion nbmodular/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.0.1"
__version__ = "0.0.4"
73 changes: 68 additions & 5 deletions nbmodular/_modidx.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,76 @@
'git_url': 'https://github.com/JaumeAmoresDS/nbmodular',
'lib_path': 'nbmodular'},
'syms': { 'nbmodular.core': {'nbmodular.core.foo': ('core.html#foo', 'nbmodular/core.py')},
'nbmodular.core.cell2func': { 'nbmodular.core.cell2func.Cell2Func': ('cell2func.html#cell2func', 'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.Cell2Func.__init__': ( 'cell2func.html#cell2func.__init__',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.Cell2Func.cell2file': ( 'cell2func.html#cell2func.cell2file',
'nbmodular.core.cell2func': { 'nbmodular.core.cell2func.CellProcessor': ( 'cell2func.html#cellprocessor',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.__init__': ( 'cell2func.html#cellprocessor.__init__',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.add_call': ( 'cell2func.html#cellprocessor.add_call',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.cell2file': ( 'cell2func.html#cellprocessor.cell2file',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.create_function': ( 'cell2func.html#cellprocessor.create_function',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.function': ( 'cell2func.html#cellprocessor.function',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.get_lib_path': ( 'cell2func.html#cellprocessor.get_lib_path',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.get_nbs_path': ( 'cell2func.html#cellprocessor.get_nbs_path',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.parse_signature': ( 'cell2func.html#cellprocessor.parse_signature',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.pipeline_code': ( 'cell2func.html#cellprocessor.pipeline_code',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.print': ( 'cell2func.html#cellprocessor.print',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.print_pipeline': ( 'cell2func.html#cellprocessor.print_pipeline',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.process_function_call': ( 'cell2func.html#cellprocessor.process_function_call',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.register_pipeline': ( 'cell2func.html#cellprocessor.register_pipeline',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.reset': ( 'cell2func.html#cellprocessor.reset',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessor.write': ( 'cell2func.html#cellprocessor.write',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.Cell2Func.function': ( 'cell2func.html#cell2func.function',
'nbmodular.core.cell2func.CellProcessorMagic': ( 'cell2func.html#cellprocessormagic',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.__init__': ( 'cell2func.html#cellprocessormagic.__init__',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.cell2file': ( 'cell2func.html#cellprocessormagic.cell2file',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.cell_processor': ( 'cell2func.html#cellprocessormagic.cell_processor',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.function': ( 'cell2func.html#cellprocessormagic.function',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.function_info': ( 'cell2func.html#cellprocessormagic.function_info',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.match': ( 'cell2func.html#cellprocessormagic.match',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.pipeline_code': ( 'cell2func.html#cellprocessormagic.pipeline_code',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.print': ( 'cell2func.html#cellprocessormagic.print',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.print_pipeline': ( 'cell2func.html#cellprocessormagic.print_pipeline',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.CellProcessorMagic.write': ( 'cell2func.html#cellprocessormagic.write',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor': ( 'cell2func.html#functionprocessor',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.__repr__': ( 'cell2func.html#functionprocessor.__repr__',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.__str__': ( 'cell2func.html#functionprocessor.__str__',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.get_ast': ( 'cell2func.html#functionprocessor.get_ast',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.print': ( 'cell2func.html#functionprocessor.print',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.to_file': ( 'cell2func.html#functionprocessor.to_file',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.update_code': ( 'cell2func.html#functionprocessor.update_code',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.FunctionProcessor.write': ( 'cell2func.html#functionprocessor.write',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.keep_variables': ( 'cell2func.html#keep_variables',
'nbmodular/core/cell2func.py'),
'nbmodular.core.cell2func.load_ipython_extension': ( 'cell2func.html#load_ipython_extension',
Expand Down
Loading