Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to Ignore certain cells metadata #106

Closed
erdnaavlis opened this issue Oct 17, 2018 · 4 comments
Closed

Add option to Ignore certain cells metadata #106

erdnaavlis opened this issue Oct 17, 2018 · 4 comments

Comments

@erdnaavlis
Copy link
Contributor

erdnaavlis commented Oct 17, 2018

I like to use NBExtensions Execute Time extension.

When using jupytext, a cell will be converted to .py as:

# + {"ExecuteTime": {"start_time": "2018-10-17T10:31:56.157308Z", "end_time": "2018-10-17T10:31:56.160823Z"}}
print("I'm a cell!")

It can be seen that it has a ExecuteTime metadata.

Would it be possible for the jupytext user to specify which metadata to keep track of and which to ignore?
In my case I don't need this info in the .py and, in fact, I don't want it because it adds noise to git diff...
But maybe other users would like to keep it in the scripts.

Please let me know your thoughts.

@mwouts comment from the discussion started on #101 :

As for your comment on metadata, I like very much the idea of filtering cell metadata. As a start we could skip the 'ExecuteTime' cell metadata for everyone (there are already a few cell metadata that jupytext does not include, cf. https://github.com/mwouts/jupytext/blob/master/jupytext/cell_metadata.py#L32). Unless you think it is important that for every notebook, one can tell which cell metadata to include/exclude? Please let me know your preference.

Regarding your comment @mwouts , I don't see a need to specify which metadata to ignore on a per notebook basis.

@mwouts
Copy link
Owner

mwouts commented Oct 17, 2018

Thanks @andrethrill . I like both the most effort and the less effort approach! In the present case I also think that ExecuteTime could always be filtered (note that, from v0.8.2 on, the cell metadata that are absent from the text representation are still preserved in the .ipynb file).

Maybe we should even ask the opposite question: which cell metadata are relevant in the text representation? For people using R Markdown, I think the relevant metadata is the one they input manually: figure size, should code or outputs be included, etc... From your previous example at #101 , the run_control is another input that should be preserved. However I see the editable, deletable flags, are you aware where they are coming from? Do you personally use them in the text file?

Finally, let's think to how the user could configure his preferences for metadata (see also #105)...

  1. jupytext section in the header could have an optional entry like cell_metadata. No entry would mean default (to be defined, see above). Value true would mean keep all metadata. Value false would mean keep no metadata. And a comma separated list could be used to preserve a few metadata keys, in addition to the default. If the list contains a minus sign, then keys after the minus sign are not represented in the text notebook (is that too complex?)
  2. I think it is important that this information belongs to the notebook itself, because when you share the notebook, the person that receives it may not have the same configuration
  3. Still, a default configuration is comfortable! Say that the contents manager has a cell_metadata configuration, and the notebook has no explicit config. Then the CM's configuration could go to the notebook metadata.

mwouts added a commit that referenced this issue Oct 18, 2018
@mwouts mwouts mentioned this issue Oct 18, 2018
mwouts added a commit that referenced this issue Oct 18, 2018
@erdnaavlis
Copy link
Contributor Author

Thanks @mwouts !

Maybe we should even ask the opposite question: which cell metadata are relevant in the text representation?

This is a good point. Personally, as I get more familiar with Jupytext the way I see it being more useful in my workflow would be to ideally stop versioning control the .ipnyb's and only git track the respective .pys (notebooks are a mess for version controlling....). Of course, this would require the notebooks to be fully recoverable from the respective scripts.

Here, 'fully' means up to the metadata that the user would choose to be of interest. Hence my point of giving such config option to the user.

From your previous example at #101 , the run_control is another input that should be preserved. However I see the editable, deletable flags, are you aware where they are coming from? Do you personally use them in the text file?

The editable and deletable flags are used by the same Frozen Cells extension I referred. It has an intermediate state between frozen and unfrozen where you can simply lock the cell. It's not editable, it's not deletable but it executes. I think it's important metadata to be kept.

  1. jupytext section in the header could have an optional entry like cell_metadata. No entry would mean default (to be defined, see above). Value true would mean keep all metadata. Value false would mean keep no metadata. And a comma separated list could be used to preserve a few metadata keys, in addition to the default. If the list contains a minus sign, then keys after the minus sign are not represented in the text notebook (is that too complex?)

That seems reasonable to me. But see my comment below:

  1. I think it is important that this information belongs to the notebook itself, because when you share the notebook, the person that receives it may not have the same configuration.

Good point! I didn't think about that. Still, I feel it to be a bit cumbersome to have to specify these details for every new notebook the user creates. I believe that each user may have different configs, but each user will very likely adopt the same configs across most of his/her notebooks. What do you think about this instead:

  • Jupytext has global config option.
  • The header entry you mentioned in 1. get's automatically populated by jupytext using the info in the global config option

This way the user has the option to personalize (add or remove specific metadata, without the - or + signs) it on a per-notebook basis. But, by default, all user notebooks assume the config the user previously defined globally.

@mwouts
Copy link
Owner

mwouts commented Oct 29, 2018

Available in v0.8.4 - see the corresponding entry in the README.

@mwouts mwouts closed this as completed Oct 29, 2018
@erdnaavlis
Copy link
Contributor Author

@mwouts I had the chance to give it a try now. I think the documentation is clear. And usability wise seems to behave as expected. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants