Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove .ipython dir and introduce kedro Jupyter kernel #1355

Merged
merged 32 commits into from
Mar 30, 2022

Conversation

antonymilne
Copy link
Contributor

@antonymilne antonymilne commented Mar 18, 2022

Note for reviewers

The diff here isn't as bad as the number of files changed makes it look! Much of it is just deleting old files. The interesting files to review are just:

  • kedro/framework/cli/jupyter.py
  • kedro/framework/cli/project.py

The docs have changed substantially though - best way to review that is not to look at the .md file but instead to check the rendered version.

Description

Closes #1338. In brief, this changes kedro jupyter notebook/lab and kedro ipython commands so that instead of using the .ipython folder, they now use the kedro.extras.extensions.ipython extension instead. This ensures that there is one consistent way of using kedro interactively.

kedro ipython is basically equivalent to ipython --ext=kedro.extras.extensions.ipython. kedro jupyter notebook/lab achieve the same thing (i.e. automatically loading the extension) by creating a custom Jupyter kernel which points to an ipython launcher that includes --ext=kedro.extras.extensions.ipython.

There's no kedro command to remove the kernel; you just do jupyter kernelspec remove kernel_name. Note that all kernels created by kedro have the kedro_ prefix so you can immediately see where they come from if you do jupyter kernelspec list. This stuff will go in the docs when I write them.

One advantage of a custom kernel is we now have a snazzy icon on the UI (update! The icon is now the black version as in the docs but I didn't change these screenshots)

Jupyter notebook
image

Jupyter lab
image

Development notes

I've needed to make lots of small decisions here exactly what to change and how big a breaking change this should be. Overall I've tried to balance up the opportunity to do a breaking release + wanting to achieve one consistent workflow + ensure that stuff still works + not being overly zealous in removing things that we're not yet sure should be removed. I'll make a separate ticket for items which could be revisited in future. Please let me know if you disagree with any of these decisions since they're very much open for discussion.

Things I've changed here

  • Removed ipython_message message informing users which variables are in scope. This seemed to be very out of date and no longer needed given the very similar message printed by load_ipython_extension
  • Removed ip argument for kedro jupyter notebook/lab. This actually won't affect anything because you can still do kedro jupyter notebook/lab --ip=xxx and it will get passed through *args
  • Removed idle_timeout argument for kedro jupyter notebook/lab. This was a tricky one to decide on because it was originally introduced for a good reason: to fix a bug in which the %run_viz line magic creates a process that blocks port 4141 even after Jupyter is killed. However, (a) the fix does not seem to work for me [i.e. %run_viz still blocks port 4141 even with the fix], (b) this fix can't ever work on managed jupyter instances so is incompatible with our goal of "one consistent workflow", (c) does anyone use %run_viz anyway? So overall I decided to remove the fix and then later let's work out whether it's really an issue and if so come up with a different fix (probably on the viz side, a %kill_viz command or something).
  • Removed the SingleKernelSpecManager that ensured the user only had one kernel to choose from. This is incompatible with the one consistent workflow goal and seemed unnecessary now that we are clearly labelling the Kedro kernel and making it the default. If we do want to reintroduce this behaviour then it might now be easier to do through KernelSpecManager.allowed_kernelspecs rather than making a whole custom kernel spec manager
  • Removed all_kernels option which gave the user the possibility of overriding the SingleKernelSpecManager to show all available kernels (since this is now how it always behaves)

Things I've not changed here

  • KEDRO_ENV environment variable remains. This provides a mechanism for changing env and is picked up when the session is created. As far as I can see, the interactive workflow doesn't strictly need this since a user can do %reload_kedro env=xxx to change environment. However, since removing this would in theory affect the creation of sessions outside the interactive workflow also I left it as is. I think we should check with users whether and how it's used and then hopefully remove it in future.
  • Calls to validate_settings remain in kedro jupyter notebook/labs. I don't know if these are still needed but I didn't want to risk taking them out and breaking stuff. If they are still needed then maybe these calls should move to kedro.extensions.extras.ipython in the future

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
@datajoely
Copy link
Contributor

Nice! Would just add I think the KEDRO_ENV env var has a lot of value outside of the interactive workflow

@@ -1,6 +1,5 @@
black==21.5b1
flake8>=3.7.9, <4.0
ipython~=7.10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just left over from a bad merge - the specification on the next line is the right one.

@@ -6,7 +6,6 @@ dynaconf>=3.1.2,<4.0.0
fsspec>=2021.4, <=2022.1
gitpython~=3.0
jmespath>=0.9.5, <1.0
jupyter_client>=5.1, <8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already in the project template requirements, which is where it really belongs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere I've actually removed jupyter_client altogether:

This is a core dependence of jupyter so I've decided to not list it explicitly so we don't have to keep bumping the version requirements.

Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
)


def _create_kernel(kernel_name: str, display_name: str) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that might eventually become its own kedro jupyter-init or something like that.

I spent a loooong time reading through the ipython and jupyter source to try and figure out which were the right high-level API commands to use here to make this as robust as possible so we're not relying on too much stuff that might change. So🤞 this carries on working in the future, but I decided to the wrap the whole call in try/except just in case...

Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty nice to see this change happening finally 🎉 I think we need to get rid of all references to the .ipython/ directory in our codebase and docs, as well as in all our official starters.

edit: I saw you already plan to do that, so my comment here is unneeded. Btw could we have the dark Kedro icon instead? It stands out better in the white-background theme in Jupyter.

@@ -1,6 +1,5 @@
black==21.5b1
flake8>=3.7.9, <4.0
ipython~=7.10
ipython>=7.31.1, <8.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we relax this one further, e.g. <9.0? The most recent ipython version atm is 8.1.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but ipython 8 requires Python >= 3.8 and we still support 3.7. Happy to bump it though if you like and let pip work out which version to install for the user's given version of Python. Or maybe even just remove it from the requirements given it's a core dependency of jupyter and jupyterlab anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can support both (i.e. no breaking changes that make us have different code for different python versions), we can bump the requirement. I'm slightly more in favour of that over removing it completely, just for users that want to use ipython over jupyter and only need to remove the relevant lines. I don't have any strong opinions on this though.

kedro/framework/cli/jupyter.py Show resolved Hide resolved
kedro/framework/cli/jupyter.py Outdated Show resolved Hide resolved
kedro/framework/cli/jupyter.py Outdated Show resolved Hide resolved
kedro/framework/cli/jupyter.py Show resolved Hide resolved
kernel_spec = json.loads(kernel_json.read_text())
kernel_spec["argv"].extend(["--ext", "kedro.extras.extensions.ipython"])
# indent=1 is to match the default ipykernel style (see ipykernel.write_kernel_spec).
kernel_json.write_text(json.dumps(kernel_spec, indent=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use json.dump here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not while using pathlib (which is what we generally use now), because json.dump needs a file pointer rather than a path object.

tests/framework/cli/test_jupyter.py Outdated Show resolved Hide resolved
tests/framework/cli/test_jupyter.py Outdated Show resolved Hide resolved
tests/framework/cli/test_project.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a little worried about the line magic not working.

kedro/framework/cli/jupyter.py Outdated Show resolved Hide resolved
kedro/framework/cli/jupyter.py Outdated Show resolved Hide resolved
kedro/framework/cli/jupyter.py Outdated Show resolved Hide resolved
kedro/framework/cli/jupyter.py Show resolved Hide resolved
kedro/framework/cli/jupyter.py Show resolved Hide resolved
kedro/framework/cli/jupyter.py Show resolved Hide resolved
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
@antonymilne
Copy link
Contributor Author

antonymilne commented Mar 25, 2022

OK, we're good to go here! Updating the docs turned into quite a substantial rewrite because a lot of the page was out of date. If you want to review the new docs the easiest way is to look at the rendered version here: https://antonymilneqb.github.io/tools_integration/ipython.html

Still to do:

@@ -3,7 +3,6 @@ flake8>=3.7.9, <4.0
ipython>=7.31.1, <8.0
isort~=5.0
jupyter~=1.0
jupyter_client>=5.1, <8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a core dependence of jupyter so I've decided to not list it explicitly so we don't have to keep bumping the version requirements.

@@ -119,7 +121,6 @@ def _collect_requirements(requires):
"ipykernel>=5.3, <7.0",
],
"geopandas": _collect_requirements(geopandas_require),
"ipython": ["ipython>=7.31.1, <8.0"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this just to simplify the number of places we specify dependencies, and because we have never advertised pip install kedro[ipython] as a thing. There's already helpful messages when you do kedro jupyter/ipython if you don't have the right things installed.


@pytest.mark.usefixtures("cleanup_kernel")
class TestCreateKernel:
def test_create_new_kernel(self):
Copy link
Contributor Author

@antonymilne antonymilne Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do this as a sort of e2e test that checks what the actual result of calling _create_kernel is rather than mocking out all the calls inside that function. This means test is not tied to a particular implementation and will be easy to modify if and when _create_kernel becomes its own kedro jupyter-init command.

Copy link
Contributor

@yetudada yetudada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I left minor comments 🚀

docs/source/tools_integration/ipython.md Outdated Show resolved Hide resolved
docs/source/tools_integration/ipython.md Show resolved Hide resolved
docs/source/tools_integration/ipython.md Show resolved Hide resolved
docs/source/tools_integration/ipython.md Outdated Show resolved Hide resolved
docs/source/tools_integration/ipython.md Show resolved Hide resolved
"""Open Jupyter Notebook with project specific variables loaded."""
_check_module_importable("jupyter_core")

_check_module_importable("notebook")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a lot more sense to me that this is now notebook, but I'm just wondering why was it jupyter_core before?

Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small observations but LGTM!

setup.py Outdated Show resolved Hide resolved
@@ -1,6 +1,5 @@
black==21.5b1
flake8>=3.7.9, <4.0
ipython~=7.10
ipython>=7.31.1, <8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can support both (i.e. no breaking changes that make us have different code for different python versions), we can bump the requirement. I'm slightly more in favour of that over removing it completely, just for users that want to use ipython over jupyter and only need to remove the relevant lines. I don't have any strong opinions on this though.

docs/source/tools_integration/ipython.md Outdated Show resolved Hide resolved
docs/source/tools_integration/ipython.md Outdated Show resolved Hide resolved
docs/source/tools_integration/ipython.md Show resolved Hide resolved
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to approve earlier, my question shouldn't be a blocker. Great work 👍

lorenabalan added 4 commits March 30, 2022 10:45
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
lorenabalan added 3 commits March 30, 2022 12:57
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Signed-off-by: lorenabalan <lorena.balan@quantumblack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make kedro ipython/jupyter commands not use the project's .ipython/ dir
6 participants