Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

Closed
Galileo-Galilei opened this issue Apr 18, 2020 · 3 comments
Closed
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@Galileo-Galilei
Copy link
Member

Description

I have tried the hooks implementation described which is part of #219. It works very smoothly, but I feel that it would be nice to have a method to deal with errors which may occurs during the execution between two hooks call.

Context

By now, hooks work by calling two methods (```before_WHATEVERandafter_WHATEVER``) which frame the code line they should wrap. In case this code line fails, the user does not have a proper to end a behaviour that was triggered by the ``begin_WHATEVER`` function.

A typical example is in #113 : if a mlflow run is started before a pipeline call, you want to be able to close it whatever even if the pipeline fails to avoid unintended side effects (further code execution will be logged in the current mlflow runs without any user warning)

Possible Implementation

Add a hook_spec where it fits (before the following line?)
https://github.com/quantumblacklabs/kedro/blob/e377bd5114b68caf236fa9f4c924bf58291efa9c/kedro/hooks/specs.py#L157-L158

@hook_spec
    def on_pipeline_error(
        self, err, other?)
    )

and adjust the following call from this :

https://github.com/quantumblacklabs/kedro/blob/8bfa0a8ce0e3bef194ba8d4ca682becf8accdc24/kedro/context/context.py#L637-L646

to this:

self._hook_manager.hook.before_pipeline_run(  # pylint: disable=no-member
    run_params=record_data, pipeline=filtered_pipeline, catalog=catalog
 )
try:
    run_result = runner.run(filtered_pipeline, catalog, run_id)
except Exception as err:
    self._hook_manager.hook.on_pipeline_error(err)
    raise err
self._hook_manager.hook.after_pipeline_run(  # pylint: disable=no-member
        run_params=record_data,
        run_result=run_result,
        pipeline=filtered_pipeline,
        catalog=catalog,
    )
@Galileo-Galilei Galileo-Galilei added the Issue: Feature Request New feature or improvement to existing feature label Apr 18, 2020
@limdauto
Copy link
Contributor

Hi @Galileo-Galilei, thank you for raising this and thank you for the initial work on the plugin. It looks very promising. So happy to see traction on this.

Regarding your feature request, I agree that all the existing hooks only take into account the happy paths. I'm inclined to agree that a hook for on_pipeline_error would be nice for your use case. Could you give me your use cases for on_node_error? The reason I'm asking is that our design principle for hooks is only to to add the absolutely necessary hooks if possible. It's very easy to add new hooks, but very hard to remove them once users start using them.

Once we confirm the use cases, we will discuss internally to see whether to make this hook available or whether there is a better way to solve your use case that we haven't thought of yet.

@yetudada yetudada changed the title Add on_pipeline_error and on_node_error methods to hooks [KED-1581] Add on_pipeline_error and on_node_error methods to hooks Apr 20, 2020
@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Apr 25, 2020

Hello @limdauto,

I actually do not have real use cases for on_node_error, but here are some uses cases that we could imagine :

  • implement a custom "retry" strategy when the node fails (for instance, if mflow fails logging the dataset but the node was correctly executed, I may want skip mlflow logging and continue the pipeline with a warning)
  • force saving the inputs of the nodes which are MemoryDataset on the disk (for instance, as pickle or other generic format) to be able to run the pipeline from this nodes (which is currently possible only if the datasets are persisted)

I am not sure these use cases should be supported, they seem rather hacky. My original idea to introduce on_node_error was rather for consistency between pipeline hooks and node hooks.

@limdauto
Copy link
Contributor

@Galileo-Galilei We have just merged these two hooks into develop. Thank you very much for the suggestion. It should go out with 0.16.

pull bot pushed a commit to FoundryAI/kedro that referenced this issue Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants