[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

Galileo-Galilei · 2020-04-18T14:28:16Z

Description

I have tried the hooks implementation described which is part of #219. It works very smoothly, but I feel that it would be nice to have a method to deal with errors which may occurs during the execution between two hooks call.

Context

By now, hooks work by calling two methods (```before_WHATEVERandafter_WHATEVER``) which frame the code line they should wrap. In case this code line fails, the user does not have a proper to end a behaviour that was triggered by the ``begin_WHATEVER`` function.

A typical example is in #113 : if a mlflow run is started before a pipeline call, you want to be able to close it whatever even if the pipeline fails to avoid unintended side effects (further code execution will be logged in the current mlflow runs without any user warning)

Possible Implementation

Add a hook_spec where it fits (before the following line?)
https://github.com/quantumblacklabs/kedro/blob/e377bd5114b68caf236fa9f4c924bf58291efa9c/kedro/hooks/specs.py#L157-L158

@hook_spec
    def on_pipeline_error(
        self, err, other?)
    )

and adjust the following call from this :

https://github.com/quantumblacklabs/kedro/blob/8bfa0a8ce0e3bef194ba8d4ca682becf8accdc24/kedro/context/context.py#L637-L646

to this:

self._hook_manager.hook.before_pipeline_run(  # pylint: disable=no-member
    run_params=record_data, pipeline=filtered_pipeline, catalog=catalog
 )
try:
    run_result = runner.run(filtered_pipeline, catalog, run_id)
except Exception as err:
    self._hook_manager.hook.on_pipeline_error(err)
    raise err
self._hook_manager.hook.after_pipeline_run(  # pylint: disable=no-member
        run_params=record_data,
        run_result=run_result,
        pipeline=filtered_pipeline,
        catalog=catalog,
    )

The text was updated successfully, but these errors were encountered:

limdauto · 2020-04-20T12:38:32Z

Hi @Galileo-Galilei, thank you for raising this and thank you for the initial work on the plugin. It looks very promising. So happy to see traction on this.

Regarding your feature request, I agree that all the existing hooks only take into account the happy paths. I'm inclined to agree that a hook for on_pipeline_error would be nice for your use case. Could you give me your use cases for on_node_error? The reason I'm asking is that our design principle for hooks is only to to add the absolutely necessary hooks if possible. It's very easy to add new hooks, but very hard to remove them once users start using them.

Once we confirm the use cases, we will discuss internally to see whether to make this hook available or whether there is a better way to solve your use case that we haven't thought of yet.

Galileo-Galilei · 2020-04-25T09:03:15Z

Hello @limdauto,

I actually do not have real use cases for on_node_error, but here are some uses cases that we could imagine :

implement a custom "retry" strategy when the node fails (for instance, if mflow fails logging the dataset but the node was correctly executed, I may want skip mlflow logging and continue the pipeline with a warning)
force saving the inputs of the nodes which are MemoryDataset on the disk (for instance, as pickle or other generic format) to be able to run the pipeline from this nodes (which is currently possible only if the datasets are persisted)

I am not sure these use cases should be supported, they seem rather hacky. My original idea to introduce on_node_error was rather for consistency between pipeline hooks and node hooks.

limdauto · 2020-05-11T16:48:13Z

@Galileo-Galilei We have just merged these two hooks into develop. Thank you very much for the suggestion. It should go out with 0.16.

Galileo-Galilei added the Issue: Feature Request New feature or improvement to existing feature label Apr 18, 2020

yetudada changed the title ~~Add on_pipeline_error and on_node_error methods to hooks~~ [KED-1581] Add on_pipeline_error and on_node_error methods to hooks Apr 20, 2020

limdauto closed this as completed May 11, 2020

pull bot pushed a commit to FoundryAI/kedro that referenced this issue Jul 17, 2020

[KED-1232] Implement PartitionedDataSet (kedro-org#329)

b9b00ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

Galileo-Galilei commented Apr 18, 2020

limdauto commented Apr 20, 2020

Galileo-Galilei commented Apr 25, 2020 •

edited

Loading

limdauto commented May 11, 2020

[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

[KED-1581] Add on_pipeline_error and on_node_error methods to hooks #329

Comments

Galileo-Galilei commented Apr 18, 2020

Description

Context

Possible Implementation

limdauto commented Apr 20, 2020

Galileo-Galilei commented Apr 25, 2020 • edited Loading

limdauto commented May 11, 2020

Galileo-Galilei commented Apr 25, 2020 •

edited

Loading