Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New platform-dependent %launch_viz line magic #910

Closed
5 tasks
antonymilne opened this issue Jun 15, 2022 · 7 comments
Closed
5 tasks

New platform-dependent %launch_viz line magic #910

antonymilne opened this issue Jun 15, 2022 · 7 comments

Comments

@antonymilne
Copy link
Contributor

antonymilne commented Jun 15, 2022

Ideal outcome:

  • make new %launch_viz line magic in kedro-viz, so that in any notebook with kedro IPython extension loaded, %launch_viz will be available
  • %launch_viz starts a kedro-viz server and supplies a URL which the user can click on to open kedro-viz in another browser window

Required steps:

  • Make sure the current code is still the best way to launch the server from inside the line magic
  • Potentially difficult. Work out the correct URL to access the kedro-viz instance on various platforms (databricks, sagemaker, etc.).
  • Potentially difficult. Work out how to programmatically obtain this URL
  • Work out how to automatically figure out which platform the notebook is running on
  • Output the correct URL or some useful message which might help a user find their kedro-viz instance if we can't figure out the URL ourselves

To consider:

@tynandebold
Copy link
Member

Which platforms do we prioritize at the beginning? Databricks, Sagemaker, and Platform McK seem like the three obvious choices.

@antonymilne
Copy link
Contributor Author

That sounds like the right choice to me also, though Platform McKinsey uses databricks so I would hope that if it works on there it would work on databricks in general.

I'm currently quite optimistic that if we can get the jupyter-server-proxy working then this will just work straight away on all these platforms 🤞

@antonymilne
Copy link
Contributor Author

antonymilne commented Jul 14, 2022

A few rough notes...

In general the jupyter-server-proxy route looks like a good one, but it won't work on Databricks.

On Azure databricks:

e.g.

def get(thing):
    return getattr(dbutils.notebook.entry_point.getDbutils().notebook().getContext(), thing)().get()

url = f"{get('browserHostName')}/driver-proxy/o/{get('workspaceId')}/{get('clusterId')}/4141/"
displayHTML(f"<a href='https://{url}'>Launch Kedro-Viz</a")

See https://github.com/AntonyMilneQB/kedro-launch-viz/tree/main/kedro_launch_viz.

@yetudada yetudada moved this from Todo to In Progress in Kedro-Viz Jul 15, 2022
@antonymilne
Copy link
Contributor Author

antonymilne commented Jul 18, 2022

Let's assume there will be two different ways that %launch_viz would work:

  1. Databricks: use the above. So far tested on Azure; still need to test on AWS and GCP.
  2. Jupyter servers: use jupyter-server-proxy. So far tested locally without JupyterHub; still need to test on Sagemaker and JupyterHub. Also what about Binder?

Next steps:

  1. above next steps for databricks method
  2. look at jupyter dash. They might have figured out all the jupyter proxy stuff already...
  3. make similar %launch_viz that starts process and links to it (find out if there's a programmatic way to get the URL through jupyter-server-proxy)
  4. something that detects your platform and switches between the above two methods; starts the process and outputs something useful ("Kedro-Viz started on port X") even if it can't work out what the platform is
  5. get %launch_viz to take arguments for --pipeline etc.
  6. make the launcher button - only works for jupyter-server-proxy (probably can't take arguments; would only be kedro viz --autoreload; need to think about how to get project path there)
  7. work out whether we need a way to kill the kedro-viz process

@tynandebold tynandebold moved this from In Progress to Todo in Kedro-Viz Jul 22, 2022
@rashidakanchwala rashidakanchwala moved this from Todo to In Progress in Kedro-Viz Aug 8, 2022
@rashidakanchwala rashidakanchwala self-assigned this Aug 8, 2022
@rashidakanchwala rashidakanchwala moved this from In Progress to Backlog in Kedro-Viz Aug 11, 2022
@antonymilne
Copy link
Contributor Author

Some more notes following work on #1012.

  • this way of handling project path by importing default_project_name seems to work well. The old user_ns method did not as it relied on pushing the variable to the user namespace, which %reload_kedro no longer does
  • we should add arguments (pipeline etc.) to %run_viz like in Add parameters to %reload_kedro line magic  kedro#1748

@antonymilne
Copy link
Contributor Author

antonymilne commented Aug 12, 2022

How to efficiently develop with Kedro-Viz on Databricks

After much trial and error, I have come up with a much more streamlined way to iterate on code being developed for Databricks. This should help to make the development loop much faster since there's no need to restart the cluster or manually handle repos this way 🎉

  1. Make a branch for your work
  2. Run make build, git add -f package/kedro_viz/html and push to GitHub. This is temporarily needed while developing on your branch so that you can pip install from GitHub but should not remain there when you merge to main
  3. On Databricks, make sure that kedro-viz and kedro are not installed as cluster libraries.
  4. In your Databricks notebook, run (fill out NAME-OF-BRANCH):
%pip uninstall -y kedro-viz
%pip install git+https://github.com/kedro-org/kedro-viz.git@NAME-OF-BRANCH#subdirectory=package

Warning. Remember there's quite a bit of confusion around differently-scoped pip installed packages. See #831. In short, use %pip (not %sh pip) if you want to install notebook-scoped and ensure that kedro and kedro-viz are installed with the same scope (cluster or notebook).

  1. To make a test project if one doesn't already exist:
%sh test -d iris || yes "" | kedro new -s pandas-iris
  1. Then load up the Kedro IPython extension, make sure you're pointing to the right project path and do as you please:
%load_ext kedro.extras.extensions.ipython 
%reload_kedro iris
%run_viz
  1. Whenever you make changes to your branch, all you need to do is push to GitHub and then re-run your notebook. This will pip install the latest changes to the branch directly from GitHub. No need to restart the cluster or clone repos any more.
  2. Make sure you remove the package/kedro_viz/html folder before merging to main.

Note. It seems like using the Databricks repos feature would be a smoother development process, but it's not. Every time you make a change to your branch you would need to pull the repo and reinstall on cluster, which means restarting the cluster every time (=slow). So don't try doing it that way...

@tynandebold
Copy link
Member

tynandebold commented Jan 16, 2023

We'll need to explore if %run_viz works on Sagemaker notebooks. For now though, we can close this.

@github-project-automation github-project-automation bot moved this from Backlog to Done in Kedro-Viz Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants