-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCatalog2.0]: Extend KedroDataCatalog
with dict interface
#4175
Comments
I'm actually less excited about this than a lot of the other proposed ideas for this redesign. What about putting it inside a property or something like a |
That's an attempt to align on the interface changes before introducing new features, so it doesn't cancel the rest of the redesign idea. But since this is a proposal of a possible interface, it will be helpful at this stage if you can expand your thoughts about why it feels not good. |
Related idea discussed in the past: #3914 (comment) |
We discussed this on backlog grooming. To clarify, the user impact of this one is that rather than doing
the user would do
While I'm not opposed to have the dict-like interface, I think it's worth discussing whether we should keep the method-based interface. We're not forced to keep it, because it's a new class and we allow ourselves to do breaking changes between minor releases. But if we do include it, I expect migration between 0.19.* and 0.20.* will definitely be easier. |
With this update, we aimed to rework the existing interface to improve naming, remove repeating methods, clarify the differences between methods working with datasets and data, and suggest functionality to iterate through the catalog and access datasets. Suggested changes to the interface
ImplementationWe intentionally do not inherit from ExamplesInitializationcatalog = KedroDataCatalog(datasets={"test": dataset})
catalog = KedroDataCatalog.from_config(**correct_config) Propertiesconfig_resolver = catalog.config_resolver
ds_config = catalog.config_resolver.resolve_pattern(ds_name) # resolve dataset patterns
patterns = catalog.config_resolver.list_patterns() # list al patterns available in the catalog
logger = catalog._logger Interface to work with datasetsprint(catalog) # __repr__
ds_count = len(catalog) # __len__
if ds_name in catalog: # __contains__
pass
if catalog_first == catalog_second: # __eq__
pass
for ds_name in catalog: # __iter__
pass
for ds_name in catalog.keys(): # keys()
pass
for ds in catalog.values(): # values()
pass
for ds_name, ds in catalog.items(): # items()
pass
ds = catalog["reviews"] # __getitem__
ds = catalog.get("reviews", default=default_ds) # get()
catalog["reviews"] = ds #__setitem__
catalog["reviews"] = 123
ds_names = catalog.list(regex_search, regex_flags) - # filter datasets' names
# release(), confirm(), exists() remained for now Interface to work with datadf = catalog.load("cars") # load_data
catalog.save("cars", df) # save_data |
Is it keys, values or both that are subject to the regex filter? |
Only keys so far, but subject to change: #3917 |
Thanks, @ElenaKhaustova. This works for us on Kedro-Viz and will help us access the catalog information without needing to resort to private methods |
Thanks @ElenaKhaustova for the summary. In general, the interface looks nice.
|
Thanks @marrrcin,
|
What will be an alternative to |
Line 115 in 82c1223
We added a dedicated method to add runtime patterns to catalog and it will be used instead when introducing breaking changes: kedro/kedro/io/kedro_data_catalog.py Line 336 in 82c1223
|
Description
As a continuation of a work on
DataCatalog
redesign (#3995 (comment)) we suggest simplifying the interface forKedroDataCatalog
and make it similar to UserDictThe reasoning: this way we want to make the catalog functioning as a collection of datasets with an interface familiar to users.
Related tickets that will be addressed:
DataCatalog
#3931Context
Possible Implementation
Change the
KedroDataCatalog
API to support__iter__
,__getitem__
,__setitem__
,keys()
,items()
similar to the dict.Replace old API such as
save()
,load()
,list()
, etc.The first iteration can be done without breaking changes if keep old interface and reuse newly added methods.
The text was updated successfully, but these errors were encountered: