Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cell tags instead of cell meta-data to mark "frozen"/demo cells #226

Closed
hoangthienan95 opened this issue May 13, 2019 · 13 comments
Closed

Comments

@hoangthienan95
Copy link

hoangthienan95 commented May 13, 2019

Hi there,

Thank you for the great tool. I'm new to it but loving it so far. I was just wondering where I should go to change the behavior of how jupytext identifies the cells to comment out when I want to import notebook as module.

I have many cells that I want to keep in .ipynb but commented out in .py as demo/working example of how the module works. However, to my knowledge, there is no extensions for jupyter lab to edit the metadata of multiple cells at once ( only cell tags), and it's hard to select all the cells that have the same metadata with it to get an overall view of what will be commented out. The suggested extension freeze doesn't have an equivalent in jupyter lab.

If this is the case. Could you point me to where I need to change for jupytext to detect a cell tag instead of the "active" keyword in the metadata? I have to tag cells for papermill and having all the metadata in one category tags would be extremely helpful. Are there any potential problems with doing this?

@mwouts
Copy link
Owner

mwouts commented May 13, 2019

Hello @hoangthienan95 ,

Good to know that you enjoy Jupytext - Thanks for your feedback!
I like your question. Thanks also for the link to the cell tags extension - I agree that the extension would make it more convenient to mark cells as non active in Jupyter Lab.

Regarding the implementation, what you're asking for would be very easy to do. You would just need to insert a new condition in the is_active function. The condition could look like this:

if 'not-active-in-scripts' in metadata.get('tags', []) and ext!='.md':
    return False

Obviously we'd need a better tag name than 'not-active-in-scripts'! Do you have a prefered tag name for this?

@hoangthienan95
Copy link
Author

Hi @mwouts, thanks so much for the swift reply! I'm mostly putting extra cells below my complicated functions to provide a demo of how the function works, what the returned data/dataframe schema looks like after calling the function, and to explain design decisions that I have made (to avoid an edge case with no other simple way around for example). It's like docstring, but executable if you have the ipynb so people can examine the outlier examples themselves.

I'm inclined to call the tag demo, or freeze or example. Let me know if you think of something better.

To piggy-back on that, how do I make what's commented out in these cells a docstring """ instead of # so users of the .py notebook don't mistake them for junk/dead code?

@hoangthienan95
Copy link
Author

I just thought about maybe other people want the cells to be active in specific extensions and not others, then maybe they can add their own "active-[extension]" or "inactive-[extension]" tags and write something quick to parse that themselves

@mwouts
Copy link
Owner

mwouts commented May 13, 2019

I'm inclined to call the tag demo, or freeze or example. Let me know if you think of something better.

I just thought about maybe other people want the cells to be active in specific extensions and not others, then maybe they can add their own "active-[extension]" or "inactive-[extension]" tags

I like the "active-[extension]" proposal, as it ressembles what we have with the "active" metadata. So "active-ipynb" would mean active only in "ipynb" format, and "active-ipynb-md" would mean active only in ipynb and md extensions.

and write something quick to parse that themselves

Well, if you want to collaborate with other users I would recommand that they all use the same convention! So I'd rather try to define a good convention here...

To piggy-back on that, how do I make what's commented out in these cells a docstring """ instead of # so users of the .py notebook don't mistake them for junk/dead code?

Sorry there's no way to do exactly that. Still, if you don't want dead code, you could put it under an if. For instance, you could define extension = "ipynb" in an ipynb-only cell, and extension = "py" in a py-only cell, and run the examples only when extension == "ipynb". Or even simpler: use if __main__ == "__main__": before the examples that you want to run in the notebook, but not when importing...

mwouts added a commit that referenced this issue May 15, 2019
@mwouts mwouts mentioned this issue May 15, 2019
@mwouts mwouts closed this as completed in f1a74f2 May 15, 2019
@mwouts
Copy link
Owner

mwouts commented May 15, 2019

Hello @hoangthienan95 , in the new release (version 1.1.2) you will be able to mark cells as active in ipynb only (active-ipynb) or py only (active-py) using cell tags. Please let me know if that works for you!

@hoangthienan95
Copy link
Author

Wow thanks so much @mwouts, really appreciate you help and thanks for everything. Keep up the good work!

@mwouts
Copy link
Owner

mwouts commented May 16, 2019

You're welcome! I'm glad this helps. By the way, you mention that you are also using papermill, do you have any suggestion on how we could improve the interaction between Jupytext and papermill? For instance, would you find it useful to be able to papermill a text notebook?

@hoangthienan95
Copy link
Author

@mwouts If you mean that I could do papermill notebook.py [params] and it would use the current environment's python interpreter to run the notebook like a script, that would be AMAZING.

Use case: after developing scripts interactively on Jupyter Notebook, I'd most likely want to run multiple instances of it on HPC in parallel. To do so, I would have to write alot of argparse/CLI parser stuff. Papermill comes with YAML parameter parsing and parameter overwrite on the command line for free. So I really wish I can just document my arguments in a markdown cell right above the cell with parameters tag, use jupytext to treat the notebook as script, and run papermill on that .py file without going through Jupyter kernel (much slower). If Jupytext can help papermill find/replace the correct parameters, and can enable running notebook like a script without using kernel, that would make the workflow seamless.

I didn't think this was possible, was just being wishful. Is this theoretically possible? I can imagine it's alot of work and a bit unreasonable as a feature request for jupytext.

@hoangthienan95
Copy link
Author

also I wonder how the text notebook would deal with the data stored (if any) from scrapbook, a package usually used with papermill to store data in notebook and later read it back out

@hoangthienan95
Copy link
Author

hoangthienan95 commented May 16, 2019

On second thought, this might be easier than I imagined. Basically, it would be running papermill with --prepare-only to inject the parameters then convert to .py with jupytext, then run the script with no parameters.

@mwouts
Copy link
Owner

mwouts commented May 17, 2019

Thanks @hoangthienan95 for sharing your use case! Very interesting. I am sure we can do something about this... I'll keep you posted!

@hoangthienan95
Copy link
Author

hoangthienan95 commented May 19, 2019

@mwouts I also opened an issue at papermill repo since the execution part seems more appropriate for papermill to do than jupytext. There seem to be other people requesting this feature as well. However, it would be really nice to have additional options in jupytext:

  1. consume/remove-source for jupytext --to py to delete the notebook that we are converting from. Right now I'm deleting the source notebooks (papermill-injected notebook copies) manually after converting. Correct me if I'm wrong, but I can always re-generate the notebooks from the py file using jupytext sync or just open the script as notebook and save.

  2. an option to add a line like #!/usr/bin/python or #!/usr/bin/env python or specific python interpreter path at the top of the converted py script. This way one can do chmod +x *.py and make them executable without any additional steps.

Let me know what you think!

@mwouts
Copy link
Owner

mwouts commented May 19, 2019

I also opened an issue at papermill repo since the execution part seems more appropriate for papermill to do than jupytext

Agreed! We won't implement anything big in Jupytext. At maximum we would use papermill and/or nbconvert internally to execute the notebook. And the minimum would be to have some documentation on this, and a few tests to make sure what we recommand does work...

consume/remove-source for jupytext --to py to delete the notebook that we are converting from.

Don't you think you could simply pipe the notebook ? Jupytext, nbconvert, and also Python I expect, can take notebooks on stdin/stdout. Maybe we could ask papermill to do that as well? I like piping as it removes the requirement to name the notebook, especially when we have varying parameters.

an option to add a line like #!/usr/bin/python or #!/usr/bin/env python

Oh, that should be possible already. Can you give a try to:

jupytext notebook.ipynb --to py -o - --update-metadata '{"jupytext":{"executable":"/usr/bin/env python"}}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants