Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md with git-filter-repo #194

Merged
merged 8 commits into from
Mar 23, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 23 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,23 +197,30 @@ Note that you need to uninstall with the same flags:
### Apply retroactively

`nbstripout` can be used to rewrite an existing Git repository using
`git filter-branch` to strip output from existing notebooks. This invocation
uses `--index-filter` and operates on all ipynb-files in the repo: :

git filter-branch -f --index-filter '
git checkout -- :*.ipynb
find . -name "*.ipynb" -exec nbstripout "{}" +
git add . --ignore-removal
[`git filter-repo`](https://github.com/newren/git-filter-repo) to strip output
from existing notebooks. This invocation operates on all ipynb files in the repo:

```sh
#!/usr/bin/env bash
# get lint-history with callback from https://github.com/newren/git-filter-repo/pull/542
./lint-history.py --relevant 'return filename.endswith(b".ipynb")' --callback '
import json, warnings, nbformat
from nbstripout import strip_output
from nbformat.reader import NotJSONError
try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
notebook = nbformat.reads(blob.data, as_version=nbformat.NO_CONVERT)
# customize to your needs
strip_output(notebook, keep_output=False, keep_count=False, keep_id=False, extra_keys=["metadata.widgets","metadata.execution","cell.attachments"], drop_empty_cells=True, drop_tagged_cells=[],strip_init_cells=False, max_size=0)
old_len = len(blob.data)
blob.data = (nbformat.writes(notebook) + "\n").encode("utf-8")
if old_len != len(blob.data):
print(change.blob_id, change.filename, old_len, len(blob.data))
except NotJSONError as e:
print("ERROR", type(e), change.blob_id, filename)
'

If the repository is large and the notebooks are in a subdirectory it will run
faster with `git checkout -- :<subdir>/*.ipynb`. You will get a warning for
commits that do not contain any notebooks, which can be suppressed by piping
stderr to `/dev/null`.

This is a potentially slower but simpler invocation using `--tree-filter`:

git filter-branch -f --tree-filter 'find . -name "*.ipynb" -exec nbstripout "{}" +'
```

### Removing empty cells

Expand Down