Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added FAQ #274

Merged
merged 2 commits into from
Jul 2, 2019
Merged

Added FAQ #274

merged 2 commits into from
Jul 2, 2019

Conversation

mwouts
Copy link
Owner

@mwouts mwouts commented Jun 30, 2019

No description provided.

@codecov
Copy link

codecov bot commented Jun 30, 2019

Codecov Report

Merging #274 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #274   +/-   ##
=======================================
  Coverage   99.19%   99.19%           
=======================================
  Files          68       68           
  Lines        6612     6612           
=======================================
  Hits         6559     6559           
  Misses         53       53

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 872c658...c16f270. Read the comment docs.

@mwouts
Copy link
Owner Author

mwouts commented Jun 30, 2019

@choldgraf, @psychemedia, may I ask your thoughts about this FAQ? Tony, does it answer some of your questions?

Copy link
Contributor

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts from me - in general I think this is a nice addition! Most of the suggestions were for clarity and organization stuff

@@ -0,0 +1,89 @@
# Frequently Asked Questions

## What is Jupytext?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd emphasize the two-way integration more prominently. The first sentence of the paragraph makes it sound redundant with nbconvert, I think the exceptional focus of jupytext is two-way conversion, rather than just one-way conversion. Something like "Jupytext is a Python package that provides two-way conversion between Jupyter Notebooks and several other text-based formats."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no to the two way conversion? I've started looking at how I could go 'ipynb free', authoring either in just markdown or py files (md only demo / works in MyBinder) perhaps with a next step later on of saving rendered notebooks as HTML or PDF (an output format) rather than ipynb.

Under this way of working, Jupytext is used to make python/markdown files editable/ executable as notebooks, but not saveable as notebooks, which makes it a one way process?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I see your point - you don't convert to an ipynb file any more. When you render the notebook, it's only the text to notebook conversion that is involved. Still I think that putting the emphasis on the two-way conversion is good: people know that the way back to the notebook is the least common part.


The text representation have much cleaner diffs than the original notebook format. Merging multiple contributions to a notebook in any of these text formats is easier than with the JSON format. Last but not least, acting on a notebook represented as text (spell check, reformat, ...) is sometimes more comfortable than in Jupyter.

## How do I use Jupytext?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO here you should just link to the "using jupytext" section of the documentation. If that documentation is too wordy and complex, then I'd add this short example as a quick "getting started" section at the top of the "using jupytext" section, rather than in the FAQ

docs/faq.md Outdated

## Which Jupytext format do you recommend?

I tend to use the Markdown format for notebooks that contain more text than code, as Markdown documents are conveniently edited in IDEs and also well rendered on GitHub.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using the word "I" in package documentation, I usually use "we" in my packages even if I'm largely the one developing the package (the hope is always that one day the package developer community will be a "we" rather than an I one day :-) )

docs/faq.md Outdated

## Can I edit the paired text file?

Yes! And when you're done, refresh the notebook in Jupyter. Refreshing will bring the latest changes to your notebook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work even if the text file is edited when a jupyter server isn't running?

e.g., if I have a notebook and paired markdown file, I synchronize them, and push them both to GitHub. Then somebody else updates just the markdown file and pushes the changes to GitHub. I pull in the changes, and turn on JupyterLab...what happens?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work even if the text file is edited when a jupyter server isn't running?

Indeed! You're free to close the server. There's no magic here: when the notebook is opened or refreshed, Jupytext reads the two files and merges inputs+outputs. The ipynb file is not modified when the notebook is read, but only the next time it is saved.

If you keep the notebook open, the extra bonus is that variables are preserved when you refresh the notebook. I'll add a question about that.

e.g., if I have a notebook and paired markdown file, I synchronize them, and push them both to GitHub. Then somebody else updates just the markdown file and pushes the changes to GitHub. I pull in the changes, and turn on JupyterLab...what happens?

What happens is what you expect: you get the latest input cells from the markdown file, matched with outputs from the ipynb file. It will work 100% if you don't push the ipynb file to GitHub, and only 99% (*) if you do push the ipynb file.

Let me explain: Jupytext in Jupyter is very strict about the assumption that Jupyter always writes the ipynb file before the md file. If git happens to write the ipynb more than one second after the md file, Jupytext will complain and refuse to open the notebook.


The `.ipynb` file contains the full notebook. The paired text file only contains the input cells and selected metadata. When the notebook is loaded by Jupyter, input cells are loaded from the text file, while the output cells and the filtered metadata are restored using the `.ipynb` file.

## Can I create a notebook from a text file?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again for content like these, I'd rather make sure there is an explicit section in the "using jupytext" documentation and then link to that from the FAQ...

docs/faq.md Outdated

## When I refresh, Jupyter warns me that my notebook has unsaved changes

Oh - you have edited both the notebook and the paired text file at the same time? Backup the text file (`git stash`), save the notebook, and merge your changes on the text file (`git stash pop`). When you're done, refresh the notebook in Jupyter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is going to be quite an advanced technique for most users - is there a simpler way to resolve this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a simpler way to resolve this?

None other than using a good editor, I am afraid! With PyCharm (mentionned at the next line), you can compare the diffs between memory and disk changes, that is very convenient.

But this is just a corner case - I think people will notice that they are changing the notebook in two different editors at the same time?

docs/faq.md Outdated
jupytext --sync notebook.ipynb # Sync the two representations
```

## If only I had known of Jupytext before!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the personality coming across in this one, but I think it'd still benefit from a more clear idea of what the section covers...e.g., "Can I re-write my git history to use text files instead of notebooks?"

@mwouts
Copy link
Owner Author

mwouts commented Jul 1, 2019

Thank you so much @choldgraf ! These are all very useful remarks. I will update the text accordingly soon...

@mwouts
Copy link
Owner Author

mwouts commented Jul 2, 2019

@choldgraf , I have updated the text following your comments, thanks! I do agree that some of the points discussed here could also be documented in the other sections of the documentation - I suggest that we see that later on (as always, PR as welcome!)

@mwouts mwouts merged commit 8675d07 into master Jul 2, 2019
@mwouts mwouts deleted the frequently_asked_questions branch July 2, 2019 04:10
@@ -2,41 +2,51 @@

## What is Jupytext?

Jupytext is a Python package that can convert Jupyter notebooks to scripts or Markdown documents. It can also convert these text documents back to Jupyter notebooks.
Jupytext is a Python package that provides _two-way_ conversion between Jupyter Notebooks and several other text-based formats like Markdown documents or scripts.

## Why would I want to convert my notebooks to text?

The text representation have much cleaner diffs than the original notebook format. Merging multiple contributions to a notebook in any of these text formats is easier than with the JSON format. Last but not least, acting on a notebook represented as text (spell check, reformat, ...) is sometimes more comfortable than in Jupyter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the use cases I have for an md only version (no ipynb save) is a notebook where I am using database calls that display sensitive information as outputs, even if the db queries and code manipulations I do later on returned data are not sensitive.

By making sure I don't save the notebook, the fact that the saved md document does not contain code outputs (sensitive data) is a win.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting use case! We could add a mention that only the inputs are saved, that they match well what the user has effectively contributed to the notebook, and that as you say that's an effective way to drop the outputs which can be large or private.


Saving notebooks as scripts is a convenient choice when you want to refactor your notebook in an IDE (or import it in another notebook, etc). Use the `percent` format if you prefer to get explicit cell markers (compatible with VScode, PyCharm, Spyder, Hydrogen...). If you prefer to get the minimal amount of cell markers, go for the `light` format.
Saving notebooks as scripts is an appropriate choice when you want to act on the code (refactor the code, import it in another script or notebook, etc). Use the `percent` format if you prefer to get explicit cell markers (compatible with VScode, PyCharm, Spyder, Hydrogen...). And if you prefer to get the minimal amount of cell markers, go for the `light` format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an aside — sorry; not sure when I'll get a chance to comment again — but maybe pertinent to the sentiment of this section: I wonder if Jupytext support for editing py files as notebooks may actually help address the issue of notebooks being an inappropriate medium for editing module files in an ad hoc development process with a judicious extension or two, eg to support code execution form NBFormat cells?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've answered your question there, maybe that's a point we'd like to see in this FAQ?


## How do paired notebooks work?

The `.ipynb` file contains the full notebook. The paired text file only contains the input cells and selected metadata. When the notebook is loaded by Jupyter, input cells are loaded from the text file, while the output cells and the filtered metadata are restored using the `.ipynb` file.
The `.ipynb` file contains the full notebook. The paired text file only contains the input cells and selected metadata. When the notebook is loaded by Jupyter, input cells are loaded from the text file, while the output cells and the filtered metadata are restored using the `.ipynb` file. When the notebook is saved in Jupyter, the two files are updated to match the current content of the notebook.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can click on either file to open it into the notebook editor, edit it and run it there, and when you save it, both files will be updated using the appropriate file format.

?What are likely problems if you have both files open at the same time?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the same as if you are editing the same document in two editors. In short, as long as you modify just one of the two documents you're safe. You may find the autosave a bit annoying, but it won't hurt as Jupytext implements timestamps checks. Read more on this in the next Q/A.

@@ -45,25 +55,29 @@ jupytext --set-formats ipynb,md --execute *.md # convert all .md files to paire

## Which files should I version control?

Unless you want to version control the output cells, you should version the text file only (and add `*.ipynb` to `.gitignore`). As discussed above, Jupyter will let you open the text representation as a notebook and will re-create the `.ipynb` file when you save the notebook.
Unless you want to version the outputs, you should version *only the text representation*. The paired `.ipynb` file can safely be deleted. It will be recreated locally the next time you open the notebook (from the text file) and save it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of people also use notebooks as document previews in eg Github, where code cells have been executed and the readable document is the complete one. This speaks more to then having a separation of concerns, which different directory paths for paired documents can help with, between a document for version control, and a document for reading / display.

This happens if you have edited the `.ipynb` file outside of Jupyter. Manual action is requested as the paired text representation may be outdated. Please edit (`touch`) the paired `.md` or `.py` file if it is not outdated, or if it is, delete it, or update it with
This happens if you have edited the `.ipynb` file outside of Jupyter. It is a safeguard to avoid overwriting the input cells of the notebook with an outdated text file.

Manual action is requested as the paired text representation may be outdated. Please edit (`touch`) the paired `.md` or `.py` file if it is not outdated, or if it is, delete it, or update it with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to touch may be confusing. Is this something that an extension might help with, or a Jupytext menu item selection that can force the documents into alignment?


Do you feel like rewriting the history of your repository and replacing every `.ipynb` file with its Jupytext Markdown representation? Technically that's just a matter of executing:
Indeed! You can substitute every `.ipynb` file in the project history with its Jupytext Markdown representation using e.g.:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this something of a nuclear option, removing the ipynb files? Maybe also handy to provide a way to just move / stash the processed .ipynb files somewhere?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! And also rewriting the history is something that one should not do too often... Maybe I'll just mention that as an fun exercise 😄

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewriting history thing is absolutely brilliant though! :-)

mwouts added a commit that referenced this pull request Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants