-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change how kedro import datasets and raise explicit error when dependencies is missing #3272
Conversation
face399
to
93dc1a9
Compare
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Github Action is not triggered, not sure why. |
* Replace a `StopIteration` exception with for-else Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> * fix the logic Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * bug fix Signed-off-by: Nok <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Could exception groups help here? https://github.com/agronholm/exceptiongroup |
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
if not hasattr(module_obj, obj_name): | ||
raise AttributeError(f"Object '{obj_name}' cannot be loaded from '{obj_path}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can now create a ModuleNotFoundError
due to lazy-loading. I decide to remove it because it adds more complexity but the error message itself isn't helpful at all.
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
When dependencies is missing.
When we have a typo.
@astrojuanlu I'd like to get some review for the current approach. We can change the error message (or in a separate PR if we are changing all of them). ps. All tests passed except 1 because I updated what |
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
I confirm this now disambiguates both cases 👍🏽 The error messages could be:
|
I can also confirm this 🙂
For this one, should we add anything about checking if
👍 |
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
…o-org/kedro into noklam/make-import-failures-in-2943 Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Cc @datajoely for some inspiration |
dataset_type = config.pop("type") | ||
if isinstance(dataset_type, str): | ||
if len(dataset_type.strip(".")) != len(dataset_type): | ||
raise DatasetError( | ||
"'type' class path does not support relative " | ||
"paths or paths ending with a dot." | ||
) | ||
class_paths = (prefix + class_obj for prefix in _DEFAULT_PACKAGES) | ||
class_paths = (prefix + dataset_type for prefix in _DEFAULT_PACKAGES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor the name because class_obj
is overloaded and confusing. Originally it is a str
but load as a class later, it's better to make it explicit.
f"{exc}. Please see the documentation on how to " | ||
f"install relevant dependencies for {class_path}:\n" | ||
f"https://kedro.readthedocs.io/en/stable/" | ||
f"kedro_project_setup/dependencies.html" | ||
f"https://docs.kedro.org/en/stable/kedro_project_setup/" | ||
f"dependencies.html#install-dependencies-related-to-the-data-catalog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the link according to review suggestion
def test_dataset_name_typo(self, mocker): | ||
# If the module doesn't exist, it return None instead ModuleNotFoundError | ||
mocker.patch("kedro.io.core.load_obj", return_value=None) | ||
dataset_name = "lAmbDaDaTAsET" | ||
|
||
with pytest.raises( | ||
DatasetError, match=f"Class '{dataset_name}' not found, is this a typo?" | ||
): | ||
parse_dataset_definition({"type": dataset_name}) | ||
|
||
def test_dataset_missing_dependencies(self, mocker): | ||
# If the module is found but import the dataset trigger ModuleNotFoundError | ||
dataset_name = "LambdaDataset" | ||
|
||
def side_effect_function(value): | ||
if "__all__" in value: | ||
return [dataset_name] | ||
else: | ||
raise ModuleNotFoundError | ||
|
||
mocker.patch("kedro.io.core.load_obj", side_effect=side_effect_function) | ||
|
||
pattern = "Please see the documentation on how to install relevant dependencies" | ||
with pytest.raises(DatasetError, match=pattern): | ||
parse_dataset_definition({"type": dataset_name}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests for testing the two common scenarios.
So two thought -
One of the common debugging patterns I find myself supporting users is as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried breaking this in different ways and I'm satisfied with the outcome 👍🏽 Thanks @noklam!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @noklam, this is a huge improvement! 👍
Description
Fix #2943
How are datasets can be imported?
kedro
's_load_ob
-kedro/kedro/io/core.py
Lines 381 to 389 in 38dfe6f
In particular,
kedro
try to import datasets from a_DEFAULT_PACKAGES
iterativelykedro/kedro/io/core.py
Line 350 in 38dfe6f
For example, when user do
catalog.load("companies")
, the following happenskedro.io.pandas
-ModuleNotFoundError
kedro_datasets.pandas
is found and exit the loopThe actual import mechanism is hidden in
kedro.io.core
'sparse_dataset_definition
which has extratry/except
with_load_obj
.To test this, you can do this.
or you can directly trigger this import mechanism via the private method.
Development notes
The main change is to differentiate two different scenarios.
Noted that since
kedro_datasets
moved to lazy loading,hasattr
can trigger aModuleNotFoundError
.Questions for reviewer
AttributeError
inload_obj
, honestly I don't think that add any more information and make the code obscure.Previously the error was hidden because
ModulenotFoundError
can be two cases:Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file