Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered cell ids #184

Merged
merged 10 commits into from
May 5, 2023
Merged

Ordered cell ids #184

merged 10 commits into from
May 5, 2023

Conversation

ghost
Copy link

@ghost ghost commented Apr 26, 2023

This PR fixes a problem that I've been having with cell IDs not being consistent, even if the cell contents remain the same. It seems to have something to do with the new nbformat. Simply removing cell.id gives a warning which in future versions will be a hard exception.

The solution that I have implemented that I think is most appropriate is to just add incremental cell id's (which is consistent with the intended use of the ids. Since IDs are usually randomly generated and most users aren't exposed to them I think this is a good solution.

I've added a new command line flag --keep-id to turn off this behaviour. My guess being on by default would match the expectations of most people.

I've also added a couple of tests for this new format.

Questions:

  • I don't know anything about the zeppelin format and the IDs there seemed to be more meaningful. Right now it would also affect the zeppelin format, should I add a check and ignore zeppelin files?
  • I haven't touched anything with bump2version or pypi because I'm not sure if this would justify a version change. Let me know what you'd like.

This is my first time contributing to an open source project like this and I'm not entirely sure about the etiquette in these situations. Let me know if I could do anything more!

Cheers,
Jason

Jason Jooste added 5 commits April 26, 2023 17:16
…d cell ids. Otherwise cell ids are assigned incrementally (after the removal of cells), which should keep them consistent across runs in version control
…tension expected_id was added for expected output with ordered ids
Copy link
Owner

@kynan kynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly LGTM.

I'm not familiar with the Zeppelin notebook format (this was contributed by @ankitrokdeonsns ). However their documentation leads me to believe the IDs might be significant, so I suggest we don't change them.

My only remaining concern is that this is a behaviour change that might surprise some users. I'll release this as a new version.

nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
nbstripout/_utils.py Outdated Show resolved Hide resolved
@@ -24,7 +24,7 @@ def nb_with_exception():

def test_cells(orig_nb):
nb_stripped = deepcopy(orig_nb)
nb_stripped = strip_output(nb_stripped, None, None)
nb_stripped = strip_output(nb_stripped, None, None, None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really your change, however can you please use kwargs for the 3 None arguments?

@@ -41,4 +41,4 @@ def test_cells(orig_nb):

def test_exception(nb_with_exception):
with pytest.raises(MetadataError):
strip_output(nb_with_exception, None, None)
strip_output(nb_with_exception, None, None, None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really your change, however can you please use kwargs for the 3 None arguments?

@@ -15,7 +15,8 @@
("test_drop_tagged_cells.ipynb", "test_drop_tagged_cells_dontdrop.ipynb.expected", []),
("test_drop_tagged_cells.ipynb", "test_drop_tagged_cells.ipynb.expected", ['--drop-tagged-cells=test']),
("test_execution_timing.ipynb", "test_execution_timing.ipynb.expected", []),
("test_max_size.ipynb", "test_max_size.ipynb.expected", ["--max-size", "50"]),
("test_max_size.ipynb", "test_max_size.ipynb.expected", ["--max-size", "50", "--keep-id"]),
("test_max_size.ipynb", "test_max_size.ipynb.expected_id", ["--max-size", "50"]),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to test_max_size.ipynb.expected_sequential_id to make the intent clearer?

@@ -26,6 +27,8 @@
("test_metadata_period.ipynb", "test_metadata_period.ipynb.expected", ["--extra-keys", "cell.metadata.application/vnd.databricks.v1+cell metadata.application/vnd.databricks.v1+notebook"]),
("test_strip_init_cells.ipynb", "test_strip_init_cells.ipynb.expected", ["--strip-init-cells"]),
("test_nbformat2.ipynb", "test_nbformat2.ipynb.expected", []),
("test_nbformat45.ipynb", "test_nbformat45.ipynb.expected", ["--keep-id"]),
("test_nbformat45.ipynb", "test_nbformat45.ipynb.expected_id", []),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to test_nbformat45.ipynb.expected_sequential_id to make the intent clearer?

Comment on lines 35 to 70
"text": [
"test_drop_empty_cells_dontdrop.ipynb.expected\r\n",
"test_drop_empty_cells.ipynb\r\n",
"test_drop_empty_cells.ipynb.expected\r\n",
"test_drop_tagged_cells_dontdrop.ipynb.expected\r\n",
"test_drop_tagged_cells.ipynb\r\n",
"test_drop_tagged_cells.ipynb.expected\r\n",
"test_execution_timing.ipynb\r\n",
"test_execution_timing.ipynb.expected\r\n",
"test_keep_metadata_keys.ipynb\r\n",
"test_keep_metadata_keys.ipynb.expected\r\n",
"test_max_size.ipynb\r\n",
"test_max_size.ipynb.expected\r\n",
"test_max_size.ipynb.expected_id\r\n",
"test_metadata_exception.ipynb\r\n",
"test_metadata_extra_keys.ipynb.expected\r\n",
"test_metadata.ipynb\r\n",
"test_metadata.ipynb.expected\r\n",
"test_metadata_keep_count.ipynb.expected\r\n",
"test_metadata_keep_output.ipynb.expected\r\n",
"test_metadata_keep_output_keep_count.ipynb.expected\r\n",
"test_metadata_notebook.ipynb\r\n",
"test_metadata_notebook.ipynb.expected\r\n",
"test_metadata_period.ipynb\r\n",
"test_metadata_period.ipynb.expected\r\n",
"test_nbformat2.ipynb.expected\r\n",
"test_nbformat5.ipynb\r\n",
"test_strip_init_cells.ipynb\r\n",
"test_strip_init_cells.ipynb.expected\r\n",
"test_unicode.ipynb\r\n",
"test_unicode.ipynb.expected\r\n",
"test_widgets.ipynb\r\n",
"test_widgets.ipynb.expected\r\n",
"test_zeppelin.zpln\r\n",
"test_zeppelin.zpln.expected\r\n"
]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this really matters as it's just static output, however it might still be confusing as the actual ls output will change as new files are added etc. Maybe just make this a markdown cell with some simple text?

@kynan kynan self-assigned this Apr 30, 2023
@kynan kynan added type:enhancement state:waiting Waiting for response for reporter labels Apr 30, 2023
@kynan kynan added this to the 0.7.0 milestone Apr 30, 2023
@ghost
Copy link
Author

ghost commented May 5, 2023

Should all be addressed right now. The merge in the middle is a little messy but I'm not sure its worth the wrangling. The zeppelin strip is in its own function so wasn't affected anyway.

@kynan kynan merged commit 08bfc67 into kynan:master May 5, 2023
@kynan
Copy link
Owner

kynan commented May 5, 2023

Thanks for your contribution!

@ghost
Copy link
Author

ghost commented May 7, 2023 via email

kynan added a commit that referenced this pull request Feb 3, 2024
* Add command line argument keep-id, which maintiains randomly generated cell ids. Otherwise cell ids are assigned incrementally (after the removal of cells), which should keep them consistent across runs in version control

* Modify test_cell and test_exception in test_keep_output_tags.py to use the new strip_output signature

* Fix failed test_end_to_end_nbstripout with test_max_size by passing --keep-id for keeping the existing ids

* Add tests for notebooks with and without the --keep-id flag. A new extension expected_id was added for expected output with ordered ids

* Modify the readme to include the --include-id flag

* Add keyword arguments for None inputs in test_keep_output_tags.py

* Rename expected output files to make desired sequential ids more explicit

Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
@kynan kynan mentioned this pull request Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:waiting Waiting for response for reporter type:enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant