Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create line magic to debug a node in notebook workflow #3510

Merged

Conversation

noklam
Copy link
Contributor

@noklam noklam commented Jan 15, 2024

Description

Partly address #2009

You can manually test it by creating a new spaceflight project and run %load_node split_data_node.

there are some skipped tests that we aim to improve on separte PR. The current scope of the PR is huge already.

To reviewer

  • Try to run it and see if there are bugs, there are some skipped tests that we are awared not support yet (nested function/lambda)
  • Does the format make sense to you? The return statement isn't handled properly yet because I am not sure how should we treat it, should we simply comment it out? It would be a naive solution because it wouldn't work nicely if a function have multiple returns.
  • Play with different platforms (databricks/sagemakers etc) - I haven't played with any of this, but I will test on databricks when I have time.

IMO the code to load catalog is the most needed part, the import statement can be handled with from xxx import * which guarantee the code can be run (good enough for debugging). The return statement is also not too important because for a debugging use case the error should happen before that. It's also not too much work for the user to just comment it out or handle edge cases themselves. Ideally it would be nice if user can just copy back the entire block back to the codebase (or even automated)

I didn't go with sophisticated solutions such as parsing the abstract syntax tree, there are many small things that we can improve but should be prioritised properly.

I also think some refactor is needed to make tests simpler, it's troublesome to write these tests and easy to miss a newlike character somewhere. I will do this but maybe a bit later.

Demo

debug

Note

  • Add support for kedro ipython?
  • Support VS Notebook (?)
  • Support write back to script?

Development notes

One of the challenge of this feature is that there are no solution that works across platform. It sounds like a simple task, but there are very limited support on creating multiple cells in notebook. The standard %load method only support creating ONE cell but not more. I end up choosing ipylab since this is the only option that support create multiple cell which offers a nicer user experience. It works only for notebook >7.0 and JupyterLab.

If we want to support VSCode notebook, we may have to settle with a single cell solution, which is still an improvement but not ideal. For the initial MVP, I think it's fine to support Notebook / Lab. If we end up supporting more, it's likely we will need to maintain platform specific code until there are nice solutions that works well everywhere.

  • Databricks Notebook
  • VSCode Notebook
  • Notebook / lab

Edge Case improvement:

  • Handle imports that are not from top of the script, i.e. functions define in the same file as the node function which lives in locals().
  • Intermediate MemoryDataset

Reference List:

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam noklam linked an issue Jan 15, 2024 that may be closed by this pull request
noklam and others added 12 commits January 16, 2024 13:46
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
…flow' of github.com:kedro-org/kedro into 2009-create-line-magic-to-debug-a-node-in-notebook-workflow
AhdraMeraliQB and others added 5 commits January 18, 2024 22:48
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Jan 19, 2024

Left to do:

  • Fix naive import fetching
  • Fix weird node function edge cases
  • Test with parameters as node input
  • Test if node names and func.name always match
  • Handle if node function makes call to helper function (unsure if in scope/consequences)

Ahdra Merali added 2 commits January 19, 2024 13:15
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Jan 31, 2024

Tech Design outcome - Actions to get this PR completed:

  • Warn users when preparing node inputs that inputs need to be explicitly declared in the catalog
  • Handle return statements by commenting out and substituting display()
  • Add a line when node not found about node name vs node function
  • Add a platform agnostic warning that this is an experimental feature and is only supports jupyter and ipython, linking parent issue %load_node line magic improvements #3580 for feature discussion

Other blocking actions:

  • Handle multi-line function definitions and add test case
  • Adjust implementation to include function definition and add a cell making the call to the function with the node inputs

Including the function definition will mean:

  • No special handling for one line functions
  • No handling needed for async def or multi-line definitions
  • No modification of the return statement creating unexpected behavior - see @astrojuanlu's comment below

@astrojuanlu
Copy link
Member

I flagged an issue about manipulating return statements #3535 (comment)

* Simplify mocking

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Check node func names

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Naive fix for return statements

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Handle nested case

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change pipelines fixture type

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove unnecessary TODO

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Check node func names"

This reverts commit 63ee194.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Replace commented return statements with a display() statement

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add warning about node name when node not found

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add line about debugging inputs in catalog

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change wording

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Replace commented return statements with a display() statement"

This reverts commit ad63afc.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Naive fix for return statements"

This reverts commit 04c022e.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

---------

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB AhdraMeraliQB force-pushed the 2009-create-line-magic-to-debug-a-node-in-notebook-workflow branch from 5d3b898 to 78ff496 Compare February 1, 2024 08:37
AhdraMeraliQB and others added 7 commits February 1, 2024 08:39
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Feb 1, 2024

I made a few changes. The original approach would have a nicer UX that allow user to run line by line, however, there are many edge cases that we need to deal with, and this is not the best time to do it.

I've taken @marrrcin advice to copy the function definition and make a separate call instead, so it would create 4 cells in a Notebook.

  1. Data load from catalog
  2. Imports
  3. Function definition
  4. a call to function (so it would print it out too if there is no error)

In addition, I notice things like lambda function is not working, it may also run into edge cases with *args, **kwargs. I have decided to tackle this later. Let's release the feature and find it out. Most edge cases are easy to fixed by the user, so I am not too worry about it.

On the other hand, the current test setup is quite painful to work with. I created #3585 which I plan to tackled after this is merged. I will freeze any new changes to this PR, any additional comments will be fixed in a new PR.

@noklam noklam marked this pull request as ready for review February 1, 2024 16:30
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried it again and it works great! I think @marrrcin 's suggestion is a real improvement 👍 I'm very excited to see this go in 😄

RELEASE.md Outdated
@@ -1,6 +1,7 @@
# Upcoming Release 0.19.3

## Major features and improvements
* Create the debugging line magic `%load_node` for Jupyter Notebook/IPython.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for ipython now as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, I've revised

)
def magic_load_node(node: str) -> None:
"""The line magic %load_node <node_name>
Currently it only support Jupyter Notebook (>7.0) and Jupyter Lab. This line magic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently it only support Jupyter Notebook (>7.0) and Jupyter Lab. This line magic
Currently it only supports Jupyter Notebook (>7.0) and Jupyter Lab. This line magic

kedro/ipython/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this feature released soon! 💯

kedro/ipython/__init__.py Outdated Show resolved Hide resolved
kedro/ipython/__init__.py Outdated Show resolved Hide resolved
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
…flow' of github.com:kedro-org/kedro into 2009-create-line-magic-to-debug-a-node-in-notebook-workflow

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam noklam enabled auto-merge (squash) February 2, 2024 16:22
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
…flow' of github.com:kedro-org/kedro into 2009-create-line-magic-to-debug-a-node-in-notebook-workflow

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam noklam merged commit 99348e6 into main Feb 2, 2024
34 checks passed
@noklam noklam deleted the 2009-create-line-magic-to-debug-a-node-in-notebook-workflow branch February 2, 2024 16:45
except ValueError:
continue
# If reached the node was not found in the project
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to support func_name at some point too? 🤔

Copy link
Contributor Author

@noklam noklam Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was briefly discussed in last TD. @AhdraMeraliQB originally have an implementation.

It runs into issue if function is used twice, nodes are unique but functions do not have to. Fundamentally we want to fix this with the default node name but this was decided to be fixed later.

@AhdraMeraliQB I can't find your original PR/commit, feel free to supplement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good reasoning, perhaps we could support full classpath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create line/cell magic to debug a node in notebook workflow
7 participants