Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flag to remove empty cell (with no data) #131

Closed
Zoynels opened this issue Jul 15, 2020 · 7 comments
Closed

flag to remove empty cell (with no data) #131

Zoynels opened this issue Jul 15, 2020 · 7 comments

Comments

@Zoynels
Copy link

Zoynels commented Jul 15, 2020

Hello, is there any possibility to add option to remove cells which not have any data (or only spaces/tabs/newlines) / tags / unfiltered metadata from ipynb?
I often create cells with no data, which will be deleted is some time. Of cource some people in such way separate code, but this option could be optional for people who want clear ipynb files in git.
Or is there any way for this with current functionality?

@kynan kynan self-assigned this Jul 28, 2020
@kynan kynan added this to the 0.4.0 milestone Jul 28, 2020
@kynan
Copy link
Owner

kynan commented Jul 28, 2020

There's no way to do this right now, but I'll add this feature in the upcoming 0.4.0 release.

@s-weigand
Copy link

The quick and dirty solution would be to just read the notebook as json and dump cells with cell["source"] == [].
This is my little script I use for the job, feel free to reuse it:

"""A little tool to remove empty cells from notebooks.
Since ``nbstripout`` doesn't have this feature yet, we do it ourselves.
See: https://github.com/kynan/nbstripout/issues/131
"""
import json
from pathlib import Path
from typing import List
from typing import Optional

SCRIPT_ROOT_PATH = Path(__file__).parent
NOTEBOOK_BASE_PATH = SCRIPT_ROOT_PATH / "source" / "notebooks"


def strip_empty_cells_from_notebooks(args: Optional[List[str]] = None) -> int:
    """Strips empty cells from notebooks in NOTEBOOK_BASE_PATH."""

    if args is None:
        notebook_paths = NOTEBOOK_BASE_PATH.rglob("*.ipynb")
    else:
        notebook_paths = [Path(arg) for arg in args]

    for notebook_path in notebook_paths:
        notebook = json.loads(notebook_path.read_text())
        originale_nr_of_cells = len(notebook["cells"])
        notebook["cells"] = [cell for cell in notebook["cells"] if cell.get("source", []) != []]
        if originale_nr_of_cells != len(notebook["cells"]):
            print(f"Fixing: {notebook_path}")
            # to ensure an `lf` newline on windows we need to use `.open` instead of `write_text`
            with notebook_path.open(mode="w", encoding="utf8", newline="\n") as f:
                f.write(json.dumps(notebook, indent=1) + "\n")

    return 0


if __name__ == "__main__":
    import sys

    exit(strip_empty_cells_from_notebooks(sys.argv[1:]))

Used as pre-commit hook:

  - repo: local
    hooks:
      - id: strip-empty-notebook-cells
        name: Strip empty notebook cells
        language: system
        entry: python docs/strip_empty_notebook_cells.py
        types: [jupyter]

To run it on all notebooks you can use python docs/strip_empty_notebook_cells.py or pre-commit run -a strip-empty-notebook-cells.
If you want to manually run it for the staged files use pre-commit run strip-empty-notebook-cells, but if the pre-commit hooks are installed this should happen on commit anyway.

I might make it a standalone hook since I don't want to copy-paste files across projects, but this is my hotfix for now.

@kynan kynan closed this as completed in c1e9bea Apr 11, 2021
kynan added a commit that referenced this issue Apr 11, 2021
@kynan
Copy link
Owner

kynan commented Apr 25, 2021

This is now available in nbstripout 0.4.0

@devmcp
Copy link

devmcp commented Jun 17, 2021

This is great. How do I make it so this option is applied as part of the git filter?

@s-weigand
Copy link

@devmcp If you use pre-commit you can simply add --strip-empty-cells to the args

  - repo: https://github.com/kynan/nbstripout
    rev: 0.4.0
    hooks:
      - id: nbstripout
        args: [--strip-empty-cells]

@devmcp
Copy link

devmcp commented Jun 18, 2021

Thanks @s-weigand. I much prefer to use it in a git filter rather than pre-commit to avoid modifying the working copy of the notebook. That said, I think if I use pre-commit to only strip empty cells (by also adding --keep-count and --keep-output) and do the rest with the git filter, that will do the trick. Thanks!

@kynan
Copy link
Owner

kynan commented Jun 20, 2021

@devmcp To use this option with the git filter, just edit your .git/config (or ~/.gitconfig if you installed globally) and add the flag to filter.nbstripout.clean and diff.ipynb.textconv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants