-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make default project entrypoint interactive (Databricks) friendly #3191
Make default project entrypoint interactive (Databricks) friendly #3191
Conversation
...lates/project/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/__main__.py
Outdated
Show resolved
Hide resolved
If this is merged, we need to make sure it get propagate correctly.
|
This hasn't been discussed on Tech Design yet #2682 (comment) |
72ffb2f
to
7fd34f0
Compare
…okiecutter.python_package }}/__main__.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
716806e
to
ff7511e
Compare
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
ff7511e
to
d0fb16a
Compare
@idanov Will you be updating the docs as well as part of this PR? As discussed before, I don't think this change benefits anyone if we don't update our documentation. I wasn't able to get this running on Databricks last time I tried, so I'm pretty sure our users will struggle as well. |
@merelcht Could we update the docs in a separate PR? This is such a small fully-backwards compatible change and it's a pity to be delayed because the docs won't be shortened (especially given that the guide in the docs will still work). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments on #1807 (comment), I'm confused about the current status of this issue and where does it originate from
I'm not a fan of this solution but looks like it's a quick patch for a real problem so I abstain
I've tried running a project with this entrypoint on Databricks. The
And the full logs show:
So I'm not quite sure this actually solves the interactivity issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that this solution doesn't work. Tested it with https://github.com/astrojuanlu/simple-click-entrypoint/tree/e247d1a
Job source:
resources:
jobs:
test_cli:
name: test-cli
tasks:
- task_key: test-cli
python_wheel_task:
package_name: simple_click_entrypoint
entry_point: wicked-entrypoint
job_cluster_key: Job_cluster
libraries:
- whl: dbfs:/FileStore/jars/14c1107d_d3cd_4c8c_a92d_ca9679d1b3e5/simple_click_entrypoint-0.1.0-py2.py3-none-any.whl
...
Result:
...
File /databricks/python/lib/python3.10/site-packages/click/core.py:681, in Context.exit(self, code)
680 """Exits the application with a given exit code."""
--> 681 raise Exit(code)
Exit: 0
During handling of the above exception, another exception occurred:
SystemExit Traceback (most recent call last)
[... skipping hidden 1 frame]
File ~/.ipykernel/1910/command--1-1229528502:18
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/simple_click_entrypoint/__main__.py:10, in main(*args, **kwargs)
8 kwargs["standalone_mode"] = not interactive
---> 10 run(*args, **kwargs)
...
It's not 100 % clear how Databricks is launching the entry point here. It's clear there's some Python code that does:
from importlib.metadata import entry_points # or maybe setuptools?
entry = entry_points(name="wicked-entrypoint")
if entry:
entry[0].load()()
but this part of the traceback makes it unclear how exactly this is being executed:
File ~/.ipykernel/1910/command--1-1229528502:18
I asked for clarification to Microsoft but the case is already an internal one MicrosoftDocs/azure-docs#116964 (comment) and also in private email to Databricks proper.
Regardless, the sys.ps1
trick is clearly not working in this case and we need to go back to square one.
Closing this PR as it doesn't fix the problem |
Looking at this again now, I can see from @merelcht 's traceback, it seems that this is due to the fact that So it seems that to this PR, we also need to add a |
Description
Currently the recommended Databricks workflow suggests creating a separate entrypoint if you would like a schedule your Kedro project as a Databricks job. The reason for that is because of the way
click
works (more on the topic can be found in #1807 and #2682). This problem exists not only in Databricks, but in all IPython or Python interactive sessions (apparently Databricks runs everything through an IPython session).This goes against the idea of a uniform entrypoint for all Kedro situations and makes Databricks deployment more difficult. The change suggested here will make the entrypoint accommodate to interactive and non-interactive workflows automatically. After this change, we can simplify further our guides for Databricks, more specifically dropping this part: https://docs.kedro.org/en/stable/deployment/databricks/databricks_deployment_workflow.html#create-an-entry-point-for-databricks and even dig deeper into how to avoid creating custom scripts or notebooks in the other two Databricks sections.
Development notes
Previous attempt to fix this was tried in #1423 by @antonymilne, but that PR had way too many goals, including allowing to import
main()
and run it from a notebook, which delayed its merge and it was postponed. The current solution only addresses the ability to run a packaged project directly in an interactive session, which is already a big improvement. I have changed only the default starter template. It's been tested manually on Databricks on my own research and also in a Databricks environment by a user.Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file