Created by Joshua Shew
- (joshua.t.shew@gmail.com).
This package is for persistent and state-dependent caching of objects in Jupyter notebooks. This means that objects (such as large intermediate data frames) are recomputing if and only if a state they are tied to has changed. Otherwise the cell, even when run, will load the object from cache. The "persistent" part of the description means that these objects (and their associated state) are stored between notebook sessions and even after the notebook has been closed and reopened.
Without notecache
:
df = expensive_computation([multiple_large_arguments])
With notecache
:
import notecache
state = {"arg1": arg1, "arg2": another_arg}
def generate(state) -> DataFrame:
return expensive_computation(state["arg1"], state["arg2"])
df = notecache.load(state, generate, unique_id = "large-data-frame")
The first time this cell is executed, expensive_computation
will be run to generate the result. Following executions of this cell will load the result instead of calling expensive_computation
, even if the notebook has closedand reopened*. The result is recomputation if and only if a change to state
has been detected.
notecache
can be found on PyPI. It can be installed with pip
.
pip install notecache
This package has one public function, load
. It is used to both store and load any given object. The 3 most important arguments passed into load
are:
-
state
This argument should contain all the information that is required to compute the object that is to be stored. A change in
state
between two calls toload
(with the sameunique_id
) will cause the object to be generated instead of loaded from cache. -
generate
This is the function that is used to generate the target object. The return value of
load
contains the return value ofgenerate(state)
. -
unique_id
The
sha512
hash value ofunique_id
is used to create a unique file name to store the object. Overlappingunique_id
in different calls toload
may cause cache objects to be overwritten.
load
returns a named tuple, and the object can be accessed with load([args]).object
.
- Contact the repository author if you used this package in a public repository or if you know of any place it is used so that it can be featured in this list.
- Fork the repository
- Clone your fork with
git clone ...
- Run the installation script:
./scripts/initialize.sh
- Confirm successful installation by running unit tests
- Activate the virtual environment:
source .venv/bin/activate
- Run the tests:
pytest tests/unit
- Activate the virtual environment:
Submit issues to the GitHub repository with steps to reproduce any bugs. Feature requests and optimization ideas can also be submitted as issues.
- Make changes on a branch in your fork
- Create tests to define behavior and get them passing
- Create a pull request with a description of the changes