Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to rewrite a project's history with Jupytext #263

Closed
3 of 4 tasks
mwouts opened this issue Jun 23, 2019 · 10 comments
Closed
3 of 4 tasks

How to rewrite a project's history with Jupytext #263

mwouts opened this issue Jun 23, 2019 · 10 comments

Comments

@mwouts
Copy link
Owner

mwouts commented Jun 23, 2019

How would @jakevdp's PythonDataScienceHandbook history have looked like with Jupytext?

Here is what I have done:

git filter-branch --tree-filter 'jupytext --to md */*.ipynb && rm -f */*.ipynb' HEAD

The results are available in this branch.

What I'd like to do next:

  • Activate Binder on the fork (i.e. just add jupytext in the requirements)
  • Fix @jakevdp's tools to work with the .md representation (i.e. replace nbformat.read/write with Jupytext's ones)
  • Fix the links from one notebook to another, if possible
  • See with @jakevdp if we can do something about Colab.
mwouts added a commit to mwouts/PythonDataScienceHandbook that referenced this issue Jun 23, 2019
mwouts added a commit to mwouts/PythonDataScienceHandbook that referenced this issue Jun 23, 2019
@mwouts
Copy link
Owner Author

mwouts commented Jun 23, 2019

This is a diff on a random commit:

image

@mwouts
Copy link
Owner Author

mwouts commented Jun 23, 2019

Even the links work! At least with Jupyter Notebook. Check by yourself on Binder - cf the updated README.md.

Getting the links to work in Jupyter Lab may be another challenge. If I open Index.md as a notebook (right-click - open with notebook), and I follow the link to another notebook, that one is opened as a Markdown document. @jasongrout, do you think we could reasonably expect that following a .md link in another .md file edited as a notebook opens the notebook editor?

image

@psychemedia
Copy link

So as a git novice, does git filter-branch --tree-filter rewrite all the commits by applying the jupytext conversion to them?

(Clarifying for myself..) Which means I could:

  • create a brach of master;
  • run the conversion on that branch;
  • use that branch to read all the commits as changes to jupytext md rather than the original ipynb.

Wow!

Quickly trying on a large legacy course repo, I note it halts with an error trying to parse a presumably malformed notebook:

[jupytext] Error: Notebook does not appear to be JSON: '{\n "cells": [\n  {\n   "cell_type": "m...

tree filter failed: jupytext --to md */*.ipynb && rm -f */*.ipynb

It's probably be worth trapping for that and perhaps putting a boilerplate .md in place ("The original notebook file is corrupted"?) If I rerun the filter-branch command, it seems to start from the beginning again, (if I went into the associated commit to try and fix the broken file, that would presumably lead to all sorts of downstream inconsistencies in the committed files?)

@mwouts
Copy link
Owner Author

mwouts commented Jun 23, 2019

So as a git novice, does git filter-branch --tree-filter rewrite all the commits by applying the jupytext conversion to them?

Well, I probably am just another git novice 😄 . I learnt of the filter-branch command this morning at https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History (search for the nuclear option). I think that rewriting the history is best done on a new branch indeed (but, if you wanted to rewrite the history on all branches, that's possible as well with an additional --all option).

Rewriting the 230 commits of the project was a bit long, it took between 10 minutes (keep .md files only) to 20 minutes (sync all .ipynb files to .md files). I see your point about the error that halts the process, and I'd be interested in providing a new option to avoid failing on errors. Would --warn-only be a good name for that? Until we do that you could give it another try with something like

jupytext --to md */*.ipynb || true && rm -f */*.ipynb

but that will still partially fail for every commit for which an incorrect notebook exists.

Also, in case you want to keep the .ipynb files and turn them into paired notebooks, the command would be

jupytext --set-formats ipynb,md --sync */*.ipynb

@psychemedia
Copy link

Trying to rewrite all the github entries for a legacy repo, I've just hit another crash out error, even when running with --warn-only:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__
    result = self[key]
KeyError: 'metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/jupytext", line 10, in <module>
    sys.exit(jupytext_cli())
  File "/usr/local/lib/python3.7/site-packages/jupytext/cli.py", line 165, in jupytext_cli
    jupytext(args)
  File "/usr/local/lib/python3.7/site-packages/jupytext/cli.py", line 263, in jupytext
    notebook = read(nb_file, fmt=fmt)
  File "/usr/local/lib/python3.7/site-packages/jupytext/jupytext.py", line 251, in read
    return read(stream, as_version=as_version, fmt=fmt, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jupytext/jupytext.py", line 256, in read
    notebook = nbformat.read(fp, as_version, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/__init__.py", line 141, in read
    return reads(fp.read(), as_version, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/__init__.py", line 74, in reads
    nb = reader.reads(s, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/nbformat/reader.py", line 61, in reads
    return versions[major].to_notebook_json(nb_dict, minor=minor)
  File "/usr/local/lib/python3.7/site-packages/nbformat/v4/nbjson.py", line 40, in to_notebook
    nb = strip_transient(nb)
  File "/usr/local/lib/python3.7/site-packages/nbformat/v4/rwbase.py", line 100, in strip_transient
    cell.metadata.pop('trusted', None)
  File "/usr/local/lib/python3.7/site-packages/ipython_genutils/ipstruct.py", line 134, in __getattr__
    raise AttributeError(key)
AttributeError: metadata
tree filter failed: jupytext --warn-only  --to md */*.ipynb && rm -f */*.ipynb

@mwouts
Copy link
Owner Author

mwouts commented Jul 22, 2019

Hello @psychemedia , thanks for reporting this. Can you try adding the KeyError in

except (ValueError, TypeError, IOError) as err:

unless you think we could afford to intercept any Exception there?

@psychemedia
Copy link

Will give it a go... it's a fair way into my history so it may take some time testing it, unless there's a shortcut through the commit tree to the one that's causing the error...?

@psychemedia
Copy link

No, still there - it's the handles at File "/usr/local/lib/python3.7/site-packages/jupytext/cli.py", line 165, in jupytext_cli which is hardwired to fail on errors?

Only the except at 410 is letting things through with a warning?

@mwouts
Copy link
Owner Author

mwouts commented Jul 22, 2019

Hello Tony,

I won't have time to publish a new release soon, so I suggest that

  • you clone Jupytext
  • replace the line 410 with
    except Exception as err: 

to catch all the possible errors,

  • build Jupytext with python setup.py sdist bdist_wheel in the root directory,
  • and install the new version with pip install sdist\jupytext-1.2.1.tar.gz

A sample notebook that produces a similar error is simply {} (test your new jupytext on such an empty notebook to confirm).

@psychemedia
Copy link

Yeah, that worked fine; not sure what was wrong before? I thought I'd upgraded to a modded version of the package but perhaps I hadn't... (environments everywhere!)

mwouts added a commit that referenced this issue Sep 15, 2019
mwouts added a commit that referenced this issue Sep 15, 2019
mwouts added a commit that referenced this issue Sep 15, 2019
@mwouts mwouts mentioned this issue Sep 16, 2019
mwouts added a commit that referenced this issue Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants