Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support of on_dataset_load_error hook just like before_dataset_loaded or after_dataset_loaded #2934

Open
thedevd opened this issue Aug 16, 2023 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@thedevd
Copy link

thedevd commented Aug 16, 2023

I have Kedro pipeline where sometime I see pipeline fails to load dataset due to some problem and I get error - "kedro.io.core.DatasetError: Failed while loading data". Is there any plan to provide on_dataset_load_error kind a hook just like we have on_node_error or on_pipeline_error so that I can do my desired things after a particular dataset load fails.

@thedevd thedevd added the Issue: Feature Request New feature or improvement to existing feature label Aug 16, 2023
@thedevd thedevd changed the title on dataset load error hook just like before_dataset_loaded or after_dataset_loaded support of on_dataset_load_error hook just like before_dataset_loaded or after_dataset_loaded Aug 16, 2023
@noklam
Copy link
Contributor

noklam commented Aug 16, 2023

@thedevd What are you trying to do here? Would be great if you can give an example why do you need this.

@thedevd
Copy link
Author

thedevd commented Aug 17, 2023

Actually in my pipeline few nodes are failing before execution while loading some datasets, and I want to handle any DataSetError via hook (Where I want to keep track of which dataset has failed to load and for which node, consider this kind case of making report of dataset failure) however there is no hook such as on_dataset_load_error in Kedro documentation for this scenario.

For example I have this node with two datasets -

node(
            check_pms,
            ['table1_pm', 'table2_pm'],
            None,
            name='check_pms'
        )

So one of the dataset is failing to load and my entire pipeline is failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants