-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ordered cell ids #184
Ordered cell ids #184
Conversation
…d cell ids. Otherwise cell ids are assigned incrementally (after the removal of cells), which should keep them consistent across runs in version control
…e the new strip_output signature
…-keep-id for keeping the existing ids
…tension expected_id was added for expected output with ordered ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly LGTM.
I'm not familiar with the Zeppelin notebook format (this was contributed by @ankitrokdeonsns ). However their documentation leads me to believe the IDs might be significant, so I suggest we don't change them.
My only remaining concern is that this is a behaviour change that might surprise some users. I'll release this as a new version.
tests/test_keep_output_tags.py
Outdated
@@ -24,7 +24,7 @@ def nb_with_exception(): | |||
|
|||
def test_cells(orig_nb): | |||
nb_stripped = deepcopy(orig_nb) | |||
nb_stripped = strip_output(nb_stripped, None, None) | |||
nb_stripped = strip_output(nb_stripped, None, None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really your change, however can you please use kwargs for the 3 None
arguments?
tests/test_keep_output_tags.py
Outdated
@@ -41,4 +41,4 @@ def test_cells(orig_nb): | |||
|
|||
def test_exception(nb_with_exception): | |||
with pytest.raises(MetadataError): | |||
strip_output(nb_with_exception, None, None) | |||
strip_output(nb_with_exception, None, None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really your change, however can you please use kwargs for the 3 None
arguments?
tests/test_end_to_end.py
Outdated
@@ -15,7 +15,8 @@ | |||
("test_drop_tagged_cells.ipynb", "test_drop_tagged_cells_dontdrop.ipynb.expected", []), | |||
("test_drop_tagged_cells.ipynb", "test_drop_tagged_cells.ipynb.expected", ['--drop-tagged-cells=test']), | |||
("test_execution_timing.ipynb", "test_execution_timing.ipynb.expected", []), | |||
("test_max_size.ipynb", "test_max_size.ipynb.expected", ["--max-size", "50"]), | |||
("test_max_size.ipynb", "test_max_size.ipynb.expected", ["--max-size", "50", "--keep-id"]), | |||
("test_max_size.ipynb", "test_max_size.ipynb.expected_id", ["--max-size", "50"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to test_max_size.ipynb.expected_sequential_id
to make the intent clearer?
tests/test_end_to_end.py
Outdated
@@ -26,6 +27,8 @@ | |||
("test_metadata_period.ipynb", "test_metadata_period.ipynb.expected", ["--extra-keys", "cell.metadata.application/vnd.databricks.v1+cell metadata.application/vnd.databricks.v1+notebook"]), | |||
("test_strip_init_cells.ipynb", "test_strip_init_cells.ipynb.expected", ["--strip-init-cells"]), | |||
("test_nbformat2.ipynb", "test_nbformat2.ipynb.expected", []), | |||
("test_nbformat45.ipynb", "test_nbformat45.ipynb.expected", ["--keep-id"]), | |||
("test_nbformat45.ipynb", "test_nbformat45.ipynb.expected_id", []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to test_nbformat45.ipynb.expected_sequential_id
to make the intent clearer?
"text": [ | ||
"test_drop_empty_cells_dontdrop.ipynb.expected\r\n", | ||
"test_drop_empty_cells.ipynb\r\n", | ||
"test_drop_empty_cells.ipynb.expected\r\n", | ||
"test_drop_tagged_cells_dontdrop.ipynb.expected\r\n", | ||
"test_drop_tagged_cells.ipynb\r\n", | ||
"test_drop_tagged_cells.ipynb.expected\r\n", | ||
"test_execution_timing.ipynb\r\n", | ||
"test_execution_timing.ipynb.expected\r\n", | ||
"test_keep_metadata_keys.ipynb\r\n", | ||
"test_keep_metadata_keys.ipynb.expected\r\n", | ||
"test_max_size.ipynb\r\n", | ||
"test_max_size.ipynb.expected\r\n", | ||
"test_max_size.ipynb.expected_id\r\n", | ||
"test_metadata_exception.ipynb\r\n", | ||
"test_metadata_extra_keys.ipynb.expected\r\n", | ||
"test_metadata.ipynb\r\n", | ||
"test_metadata.ipynb.expected\r\n", | ||
"test_metadata_keep_count.ipynb.expected\r\n", | ||
"test_metadata_keep_output.ipynb.expected\r\n", | ||
"test_metadata_keep_output_keep_count.ipynb.expected\r\n", | ||
"test_metadata_notebook.ipynb\r\n", | ||
"test_metadata_notebook.ipynb.expected\r\n", | ||
"test_metadata_period.ipynb\r\n", | ||
"test_metadata_period.ipynb.expected\r\n", | ||
"test_nbformat2.ipynb.expected\r\n", | ||
"test_nbformat5.ipynb\r\n", | ||
"test_strip_init_cells.ipynb\r\n", | ||
"test_strip_init_cells.ipynb.expected\r\n", | ||
"test_unicode.ipynb\r\n", | ||
"test_unicode.ipynb.expected\r\n", | ||
"test_widgets.ipynb\r\n", | ||
"test_widgets.ipynb.expected\r\n", | ||
"test_zeppelin.zpln\r\n", | ||
"test_zeppelin.zpln.expected\r\n" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that this really matters as it's just static output, however it might still be confusing as the actual ls
output will change as new files are added etc. Maybe just make this a markdown cell with some simple text?
Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
…pout into ordered_cell_ids
Should all be addressed right now. The merge in the middle is a little messy but I'm not sure its worth the wrangling. The zeppelin strip is in its own function so wasn't affected anyway. |
Thanks for your contribution! |
My pleasure 🙂
Jason Jooste
Machine Learning Operations Engineer
Data61 | CSIRO
***@***.******@***.***>
…________________________________
From: Florian Rathgeber ***@***.***>
Sent: 06 May 2023 05:24
To: kynan/nbstripout ***@***.***>
Cc: Jooste, Jason (Data61, Pullenvale) ***@***.***>; Author ***@***.***>
Subject: Re: [kynan/nbstripout] Ordered cell ids (PR #184)
Thanks for your contribution!
—
Reply to this email directly, view it on GitHub<#184 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A4YFHRNXUNTY54JAGKFV2ZDXEVHVFANCNFSM6AAAAAAXMDVUMU>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
* Add command line argument keep-id, which maintiains randomly generated cell ids. Otherwise cell ids are assigned incrementally (after the removal of cells), which should keep them consistent across runs in version control * Modify test_cell and test_exception in test_keep_output_tags.py to use the new strip_output signature * Fix failed test_end_to_end_nbstripout with test_max_size by passing --keep-id for keeping the existing ids * Add tests for notebooks with and without the --keep-id flag. A new extension expected_id was added for expected output with ordered ids * Modify the readme to include the --include-id flag * Add keyword arguments for None inputs in test_keep_output_tags.py * Rename expected output files to make desired sequential ids more explicit Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
This PR fixes a problem that I've been having with cell IDs not being consistent, even if the cell contents remain the same. It seems to have something to do with the new nbformat. Simply removing cell.id gives a warning which in future versions will be a hard exception.
The solution that I have implemented that I think is most appropriate is to just add incremental cell id's (which is consistent with the intended use of the ids. Since IDs are usually randomly generated and most users aren't exposed to them I think this is a good solution.
I've added a new command line flag --keep-id to turn off this behaviour. My guess being on by default would match the expectations of most people.
I've also added a couple of tests for this new format.
Questions:
This is my first time contributing to an open source project like this and I'm not entirely sure about the etiquette in these situations. Let me know if I could do anything more!
Cheers,
Jason