Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make default project entrypoint interactive (Databricks) friendly #3191

Closed

Conversation

idanov
Copy link
Member

@idanov idanov commented Oct 18, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Currently the recommended Databricks workflow suggests creating a separate entrypoint if you would like a schedule your Kedro project as a Databricks job. The reason for that is because of the way click works (more on the topic can be found in #1807 and #2682). This problem exists not only in Databricks, but in all IPython or Python interactive sessions (apparently Databricks runs everything through an IPython session).

This goes against the idea of a uniform entrypoint for all Kedro situations and makes Databricks deployment more difficult. The change suggested here will make the entrypoint accommodate to interactive and non-interactive workflows automatically. After this change, we can simplify further our guides for Databricks, more specifically dropping this part: https://docs.kedro.org/en/stable/deployment/databricks/databricks_deployment_workflow.html#create-an-entry-point-for-databricks and even dig deeper into how to avoid creating custom scripts or notebooks in the other two Databricks sections.

Development notes

Previous attempt to fix this was tried in #1423 by @antonymilne, but that PR had way too many goals, including allowing to import main() and run it from a notebook, which delayed its merge and it was postponed. The current solution only addresses the ability to run a packaged project directly in an interactive session, which is already a big improvement. I have changed only the default starter template. It's been tested manually on Databricks on my own research and also in a Databricks environment by a user.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@idanov idanov requested a review from merelcht as a code owner October 18, 2023 09:57
@idanov idanov requested a review from noklam October 18, 2023 10:13
@noklam
Copy link
Contributor

noklam commented Oct 25, 2023

If this is merged, we need to make sure it get propagate correctly.

  1. This PR goes into kedro
  2. kedro-starters (new ticket)
  3. documentation change (kedro, new ticket or combined 1 or 2)

@astrojuanlu
Copy link
Member

This hasn't been discussed on Tech Design yet #2682 (comment)

@idanov idanov changed the base branch from main to develop October 26, 2023 22:18
@idanov idanov force-pushed the feature/make-project-entrypoints-interactive-friendly branch from 72ffb2f to 7fd34f0 Compare November 6, 2023 11:25
idanov and others added 3 commits November 6, 2023 11:32
…okiecutter.python_package }}/__main__.py

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
@idanov idanov force-pushed the feature/make-project-entrypoints-interactive-friendly branch from 716806e to ff7511e Compare November 6, 2023 11:32
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
@idanov idanov force-pushed the feature/make-project-entrypoints-interactive-friendly branch from ff7511e to d0fb16a Compare November 6, 2023 11:35
@merelcht
Copy link
Member

merelcht commented Nov 6, 2023

@idanov Will you be updating the docs as well as part of this PR? As discussed before, I don't think this change benefits anyone if we don't update our documentation. I wasn't able to get this running on Databricks last time I tried, so I'm pretty sure our users will struggle as well.

@idanov
Copy link
Member Author

idanov commented Nov 7, 2023

@merelcht Could we update the docs in a separate PR? This is such a small fully-backwards compatible change and it's a pity to be delayed because the docs won't be shortened (especially given that the guide in the docs will still work).

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments on #1807 (comment), I'm confused about the current status of this issue and where does it originate from

@astrojuanlu astrojuanlu dismissed their stale review November 8, 2023 13:17

I'm not a fan of this solution but looks like it's a quick patch for a real problem so I abstain

@merelcht
Copy link
Member

merelcht commented Nov 9, 2023

I've tried running a project with this entrypoint on Databricks. The kedro run passes, but the job is still marked as "failed" and I get the following

2023-11-09 13:03:21,203 - kedro_telemetry.plugin - WARNING - Failed to confirm consent. No data was sent to Heap. Exception: raw_input was called, but this frontend does not support input requests.

And the full logs show:

---------------------------------------------------------------------------
Exit                                      Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/click/core.py:1069, in BaseCommand.main(self, args, prog_name, complete_var, standalone_mode, **extra)
   1062         # it's not safe to `ctx.exit(rv)` here!
   1063         # note that `rv` may actually contain data like "1" which
   1064         # has obvious effects
   (...)
   1067         # even always obvious that `rv` indicates success/failure
   1068         # by its truthiness/falsiness
-> 1069         ctx.exit()
   1070 except (EOFError, KeyboardInterrupt):

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/click/core.py:696, in Context.exit(self, code)
    695 """Exits the application with a given exit code."""
--> 696 raise Exit(code)

Exit: 0

During handling of the above exception, another exception occurred:

SystemExit                                Traceback (most recent call last)
    [... skipping hidden 1 frame]

File ~/.ipykernel/6078/command--1-447824188:18
     16 if entry:
     17   # Load and execute the entrypoint, assumes no parameters
---> 18   entry[0].load()()
     19 else:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/iris_databricks/__main__.py:45, in main(*args, **kwargs)
     43 kwargs["standalone_mode"] = not interactive
---> 45 run(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/click/core.py:1134, in BaseCommand.__call__(self, *args, **kwargs)
   1133 """Alias for :meth:`main`."""
-> 1134 return self.main(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/click/core.py:1087, in BaseCommand.main(self, args, prog_name, complete_var, standalone_mode, **extra)
   1086 if standalone_mode:
-> 1087     sys.exit(e.exit_code)
   1088 else:
   1089     # in non-standalone mode, return the exit code
   1090     # note that this is only reached if `self.invoke` above raises
   (...)
   1095     # `ctx.exit(1)` and to `return 1`, the caller won't be able to
   1096     # tell the difference between the two

SystemExit: 0

During handling of the above exception, another exception occurred:

AssertionError                            Traceback (most recent call last)
    [... skipping hidden 1 frame]

File /databricks/python/lib/python3.10/site-packages/IPython/core/interactiveshell.py:2047, in InteractiveShell.showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
   2044 if exception_only:
   2045     stb = ['An exception has occurred, use %tb to see '
   2046            'the full traceback.\n']
-> 2047     stb.extend(self.InteractiveTB.get_exception_only(etype,
   2048                                                      value))
   2049 else:
   2050     try:
   2051         # Exception classes can customise their traceback - we
   2052         # use this in IPython.parallel for exceptions occurring
   2053         # in the engines. This should return a list of strings.

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:585, in ListTB.get_exception_only(self, etype, value)
    577 def get_exception_only(self, etype, value):
    578     """Only print the exception type and message, without a traceback.
    579 
    580     Parameters
   (...)
    583     value : exception value
    584     """
--> 585     return ListTB.structured_traceback(self, etype, value)

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:452, in ListTB.structured_traceback(self, etype, evalue, etb, tb_offset, context)
    449     chained_exc_ids.add(id(exception[1]))
    450     chained_exceptions_tb_offset = 0
    451     out_list = (
--> 452         self.structured_traceback(
    453             etype, evalue, (etb, chained_exc_ids),
    454             chained_exceptions_tb_offset, context)
    455         + chained_exception_message
    456         + out_list)
    458 return out_list

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:1118, in AutoFormattedTB.structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1116 else:
   1117     self.tb = tb
-> 1118 return FormattedTB.structured_traceback(
   1119     self, etype, value, tb, tb_offset, number_of_lines_of_context)

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:1012, in FormattedTB.structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1009 mode = self.mode
   1010 if mode in self.verbose_modes:
   1011     # Verbose modes need a full traceback
-> 1012     return VerboseTB.structured_traceback(
   1013         self, etype, value, tb, tb_offset, number_of_lines_of_context
   1014     )
   1015 elif mode == 'Minimal':
   1016     return ListTB.get_exception_only(self, etype, value)

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:865, in VerboseTB.structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context)
    856 def structured_traceback(
    857     self,
    858     etype: type,
   (...)
    862     number_of_lines_of_context: int = 5,
    863 ):
    864     """Return a nice text document describing the traceback."""
--> 865     formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
    866                                                            tb_offset)
    868     colors = self.Colors  # just a shorthand + quicker name lookup
    869     colorsnormal = colors.Normal  # used a lot

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:799, in VerboseTB.format_exception_as_a_whole(self, etype, evalue, etb, number_of_lines_of_context, tb_offset)
    796 assert isinstance(tb_offset, int)
    797 head = self.prepare_header(etype, self.long_header)
    798 records = (
--> 799     self.get_records(etb, number_of_lines_of_context, tb_offset) if etb else []
    800 )
    802 frames = []
    803 skipped = 0

File /databricks/python/lib/python3.10/site-packages/IPython/core/ultratb.py:854, in VerboseTB.get_records(self, etb, number_of_lines_of_context, tb_offset)
    848     formatter = None
    849 options = stack_data.Options(
    850     before=before,
    851     after=after,
    852     pygments_formatter=formatter,
    853 )
--> 854 return list(stack_data.FrameInfo.stack_data(etb, options=options))[tb_offset:]

File /databricks/python/lib/python3.10/site-packages/stack_data/core.py:578, in FrameInfo.stack_data(cls, frame_or_tb, options, collapse_repeated_frames)
    562 @classmethod
    563 def stack_data(
    564         cls,
   (...)
    568         collapse_repeated_frames: bool = True
    569 ) -> Iterator[Union['FrameInfo', RepeatedFrames]]:
    570     """
    571     An iterator of FrameInfo and RepeatedFrames objects representing
    572     a full traceback or stack. Similar consecutive frames are collapsed into RepeatedFrames
   (...)
    576     and optionally an Options object to configure.
    577     """
--> 578     stack = list(iter_stack(frame_or_tb))
    580     # Reverse the stack from a frame so that it's in the same order
    581     # as the order from a traceback, which is the order of a printed
    582     # traceback when read top to bottom (most recent call last)
    583     if is_frame(frame_or_tb):

File /databricks/python/lib/python3.10/site-packages/stack_data/utils.py:97, in iter_stack(frame_or_tb)
     95 while frame_or_tb:
     96     yield frame_or_tb
---> 97     if is_frame(frame_or_tb):
     98         frame_or_tb = frame_or_tb.f_back
     99     else:

File /databricks/python/lib/python3.10/site-packages/stack_data/utils.py:90, in is_frame(frame_or_tb)
     89 def is_frame(frame_or_tb: Union[FrameType, TracebackType]) -> bool:
---> 90     assert_(isinstance(frame_or_tb, (types.FrameType, types.TracebackType)))
     91     return isinstance(frame_or_tb, (types.FrameType,))

File /databricks/python/lib/python3.10/site-packages/stack_data/utils.py:176, in assert_(condition, error)
    174 if isinstance(error, str):
    175     error = AssertionError(error)
--> 176 raise error

AssertionError: 

So I'm not quite sure this actually solves the interactivity issues?

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that this solution doesn't work. Tested it with https://github.com/astrojuanlu/simple-click-entrypoint/tree/e247d1a

Job source:

resources:
  jobs:
    test_cli:
      name: test-cli
      tasks:
        - task_key: test-cli
          python_wheel_task:
            package_name: simple_click_entrypoint
            entry_point: wicked-entrypoint
          job_cluster_key: Job_cluster
          libraries:
            - whl: dbfs:/FileStore/jars/14c1107d_d3cd_4c8c_a92d_ca9679d1b3e5/simple_click_entrypoint-0.1.0-py2.py3-none-any.whl
...

Result:

...
File /databricks/python/lib/python3.10/site-packages/click/core.py:681, in Context.exit(self, code)
    680 """Exits the application with a given exit code."""
--> 681 raise Exit(code)

Exit: 0

During handling of the above exception, another exception occurred:

SystemExit                                Traceback (most recent call last)
    [... skipping hidden 1 frame]

File ~/.ipykernel/1910/command--1-1229528502:18
     16 if entry:
     17   # Load and execute the entrypoint, assumes no parameters
---> 18   entry[0].load()()
     19 else:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/simple_click_entrypoint/__main__.py:10, in main(*args, **kwargs)
      8 kwargs["standalone_mode"] = not interactive
---> 10 run(*args, **kwargs)
...

It's not 100 % clear how Databricks is launching the entry point here. It's clear there's some Python code that does:

from importlib.metadata import entry_points  # or maybe setuptools?

entry = entry_points(name="wicked-entrypoint")

if entry:
    entry[0].load()()

but this part of the traceback makes it unclear how exactly this is being executed:

File ~/.ipykernel/1910/command--1-1229528502:18

I asked for clarification to Microsoft but the case is already an internal one MicrosoftDocs/azure-docs#116964 (comment) and also in private email to Databricks proper.

Regardless, the sys.ps1 trick is clearly not working in this case and we need to go back to square one.

@astrojuanlu
Copy link
Member

Closing this PR as it doesn't fix the problem

@idanov
Copy link
Member Author

idanov commented Jul 12, 2024

Looking at this again now, I can see from @merelcht 's traceback, it seems that this is due to the fact that click raises exception for a successful completion of the command as per this issue here opened by @antonymilne: pallets/click#2249

So it seems that to this PR, we also need to add a try: except block around run(...) or something of the likes in order to fix the issues both @merelcht and @astrojuanlu experience.

@merelcht merelcht deleted the feature/make-project-entrypoints-interactive-friendly branch October 3, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants