Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to use the Config Loader #2819

Closed
astrojuanlu opened this issue Jul 18, 2023 · 35 comments
Closed

Make it easier to use the Config Loader #2819

astrojuanlu opened this issue Jul 18, 2023 · 35 comments
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 18, 2023

We have evidence from users that want to use specific parts of Kedro without having to use the rest of the Framework.

For example, #2741

@WaylonWalker: Would it make sense to make mini-kedro installable? My use case for projects like that are users doing EDA and just want easy access to the data with no fuss.

And #2409

@bp I have a library with several implementations of AbstractDataSet that I use to access proprietary data connectors at my employer, I would like to share this library with coworkers that are not using Kedro for use through the code API without the overhead of a full Kedro install.

@Galileo-Galilei As the OP, I have a bucnh of use cases where I introduced kedro to data analysts who are not familiar with python. I introduced Kedro to them though the catalog and a notebook (with extra facilities to do SQL inside). They really enjoy the abstraction and the ability to use data from very differetn sources (old Access databases, files from s3 storage, parquet table from internal datalake, SAS for old datawarehouse... and save there results in excel). This avoids copying data and therefore increase security / data management / development speed. Kedro is a bit scary to them and I'd like to introduce the abstraction separately.

On the other hand, @yetudada tried to use the Kedro Config Loader on an existing project with a flat structure and had a frustrating experience because of the amount of boilerplate needed (with #2593 being only the last straw). This is something I've experienced myself in my trainings:

from omegaconf import OmegaConf
from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog

if not OmegaConf.has_resolver("pl"):
    OmegaConf.register_new_resolver("pl", lambda attr: getattr(pl, attr))

# See https://github.com/kedro-org/kedro/issues/2583
conf_loader = OmegaConfigLoader("conf", config_patterns={"catalog": ["catalog.yml", "**/catalog.yml"]})
conf_catalog = conf_loader.get("catalog")
catalog = DataCatalog.from_config(conf_catalog)

It's a lot to type, and I sense that most users are probably impressed with my live demo skills but desperate at the number of concepts they have to learn to just use a Kedro component.

This is closely related to #2818, which is concerned with documentation. This issue is about exploring the necessary code changes for this to happen. The key frame of reference is:

How to use the Kedro library components without assuming that I'm using the Kedro framework. And perhaps it's also about how do we make sure users use some parts of Kedro (and therefore some best practice) and not none.

In particular:

  • Should I not have an easy experience of loading configuration without framework?
  • Should I not have an easy experience of using the Data Catalog and datasets without framework?
@yetudada
Copy link
Contributor

The workflow that @astrojuanlu referenced that I did.

What’s happening?

If I wanted to use Kedro as a library - and specifically use our ConfigLoader and DataCatalog. The two features that users want the most, then this does not work:

my-project
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
├── parameters.yml
├── catalog.yml
└── requirements.txt

What do you need to do?

We have some assumptions built into the OmegaConfigLoader, it assumes that you have:

  • A conf directory
  • A conf/base directory
  • And, conf/local directory

All of this assumes that you are familiar with Kedro’s project template - how else would you know about this folder structure?

Your project has to look like this:

my-project
|-- conf/
|   |-- base/
|       |-- catalog.yml
|       |-- parameters.yml
|   |-- local/
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
└── requirements.txt

And then you need to have the following:

requirements.txt

kedro==0.18.11
kedro-datasets[pandas.CSVDataSet]~=1.1

my-notebook.ipynb

from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog

# Load configuration for the catalog and parameters
# OmegaConfigLoader assumes you have directories for conf, conf/base and conf/local
# Users need to know that they need to put the catalog.yml in conf/base

conf_loader = OmegaConfigLoader(conf_source= "conf")
conf_catalog = conf_loader.get("catalog")
catalog = DataCatalog.from_config(conf_catalog)

customer_data = catalog.load("customers")

What errors did I run into before I got this working?

I first tried to put catalog.yml into the OmegaConfigLoader

Screenshot 2023-07-18 at 11 23 50

Then I created a conf directory and put catalog.yml in there

Screenshot 2023-07-18 at 11 26 09

Then I created a base directory in conf and put catalog.yml in there

Screenshot 2023-07-18 at 11 27 00

💢 Then I said - WTF

Then I created a local directory and it finally worked

Luckily I know conf/local is supposed to be empty. If I didn't read the documentation, how would I know this?

Screenshot 2023-07-18 at 11 29 21

@MatthiasRoels
Copy link

The issue described here refers to the config loader and not the data catalog itself. The data catalog. You can create a catalog object directly if you have a catalog dict object (mapping a dataset name to its dataset class config) and a credentials config.

This means that you don’t need a config loader and its complex file structure to work with a data catalog. What still bothers me though is the fact that you need that credentials config file. Ideally we can do something with that to abstract away the credentials part…

@astrojuanlu
Copy link
Member Author

You can create a catalog object directly if you have a catalog dict object (mapping a dataset name to its dataset class config) and a credentials config.

Good point @MatthiasRoels

What still bothers me though is the fact that you need that credentials config file. Ideally we can do something with that to abstract away the credentials part…

You can instantiate a DataCatalog with an empty credentials dictionary. Or you mean the ConfigLoader in this case?

@MatthiasRoels
Copy link

I know you can use an empty credentials file, but is would be nicer to inject credentials in a different way though…

@noklam
Copy link
Contributor

noklam commented Jul 30, 2023

I know you can use an empty credentials file, but is would be nicer to inject credentials in a different way though…

Are you refering this specific case using standalone or in general? There are ways to inject credentials without credentials.yml already in a Kedro Project.

@noklam
Copy link
Contributor

noklam commented Jul 30, 2023

I did the same exercise as @yetudada recently. In a notebook case, using ConfigLoader doesn't make too much sense until you have multiple configurations file. If all you have is one file you may as well just use yaml.safe_load. The added values for ConfigLoader is the templating, pattern etc, most of these functions are not useful in the context of a notebook.

I try to argue that #2861 is one way of making these components more useful in a standalone notebook.

@MatthiasRoels
Copy link

Are you refering this specific case using standalone or in general? There are ways to inject credentials without credentials.yml already in a Kedro Project.

I was referring to this specific use case for a standalone catalog. I know there are other ways to inject credentials in a regular kedro project. But if we can make the UX nicer for a standalone, we might be able to leverage the same functionality in a kedro project too, making that part easier. I don’t have an immediate solution in mind for this, but more than happy to brainstorm about it!

@yetudada yetudada changed the title Make it easier to use the Config Loader and the Data Catalog separately Make it easier to use the Config Loader Aug 6, 2023
@noklam
Copy link
Contributor

noklam commented Aug 8, 2023

Currently if you try to create DataCatalog without using context.catalog, you will be surprised by path issues. There is some logic that tightly couple with the project path and converting them into absolute path.

def _get_catalog(
self,
save_version: str = None,
load_versions: dict[str, str] = None,
) -> DataCatalog:
"""A hook for changing the creation of a DataCatalog instance.
Returns:
DataCatalog defined in `catalog.yml`.
Raises:
KedroContextError: Incorrect ``DataCatalog`` registered for the project.
"""
# '**/catalog*' reads modular pipeline configs
conf_catalog = self.config_loader["catalog"]
# turn relative paths in conf_catalog into absolute paths
# before initializing the catalog
conf_catalog = _convert_paths_to_absolute_posix(
project_path=self.project_path, conf_dictionary=conf_catalog
)

@noklam
Copy link
Contributor

noklam commented Aug 8, 2023

(comments for myself to refer in the future)
The easiest way to instantiate catalog in notebook is this (assuming no credentials needed)

config_loader = ConfigLoader("../conf")
conf_catalog = config_loader["catalog"]
catalog = DataCatalog.from_config(catalog_conf)
catalog.load("companies") # fail - No such file or directory: 'data/01_raw/companies.csv'

This will fail - because you need to use context.catalog

from kedro.framework.context.context import _convert_paths_to_absolute_posix
config_loader = ConfigLoader("../conf")
conf_catalog = config_loader["catalog"]
conf_catalog = _convert_paths_to_absolute_posix(
    project_path=Path("../").resolve(), conf_dictionary=conf_catalog
)
catalog = DataCatalog.from_config(conf_catalog)
catalog.load("companies")

This will pass

@MatthiasRoels
Copy link

I see what you mean @noklam, but shouldn’t we address this to make the catalog more self-contained? This should be an easy change (moving that _convert_paths_to_absolute_posix call to the from_config method), no?

@noklam
Copy link
Contributor

noklam commented Aug 8, 2023

@MatthiasRoels Hey! I wrote this comments mainly for future reference. I am not working on this issue at the moment. Rest assure, we are not settled with this ugly workaround - I don't know yet what's the final solution for this yet.

@MatthiasRoels
Copy link

With all the info from this issue, do we feel we are already in a position to shape it further in another issue or even a PR?

@merelcht merelcht added this to the Make it easier to use Kedro as a library milestone Aug 10, 2023
@astrojuanlu
Copy link
Member Author

Tangentially related: a hello_kedro.py example that we used to have in our docs https://docs.kedro.org/en/0.17.0/02_get_started/03_hello_kedro.html

@noklam
Copy link
Contributor

noklam commented Aug 23, 2023

Potential Options:

  1. Change defaults to "." and change settings.py default to "base" and "local" (and starters)
  2. Change defaults to "." and change KedroSession.create
settings.CONFIG_LOADER_CLASS
        return config_loader_class(
            conf_source=self._conf_source,
            env=env,
            runtime_params=extra_params,
            **settings.CONFIG_LOADER_ARGS,
        )
# Maybe update this in `session.py`?
config_loader_args = {"base_env"="base", "default_run_env"="local"}
config_loader_args.update(settings.CONFIG_LOADER_ARGS) # Overwrite if any
config_loader_class(
            conf_source=self._conf_source,
            env=env,
            runtime_params=extra_params,
            **settings.config_loader_args,
        )

# Original
settings.CONFIG_LOADER_CLASS
        return config_loader_class(
            conf_source=self._conf_source,
            env=env,
            runtime_params=extra_params,
            **settings.CONFIG_LOADER_ARGS,
        )

I prefer 2. and we can do it now instead of waiting 0.19 cc @merelcht

@MatthiasRoels
Copy link

I agree with what @MatthiasRoels but I also want to stress that it should be purely reading config. We shouldn't try to add any resolving ability, so doing templating will not work.

I fully agree with you @noklam! No resolution but what about stuff related to _convert_paths_to_absolute_posix?

@noklam
Copy link
Contributor

noklam commented Aug 24, 2023

@MatthiasRoels see #2965, we will move that to DataCatalog. Optionally may include arguments to define the keywords for "path". Let's keep it ConfigLoader only as I notice we are discussing DataCatalog in this thread. I create a separate ticket for DataCatalog.from_file

@yetudada yetudada modified the milestones: Make it easier to use Kedro as a library, Using Kedro with existing projects Sep 4, 2023
@yetudada
Copy link
Contributor

yetudada commented Sep 4, 2023

Reassigned this issue to the milestone on Using Kedro with existing projects, this is a solution to that.

@astrojuanlu
Copy link
Member Author

Tangentially related: #2512

@astrojuanlu
Copy link
Member Author

In our internal discussions it was noted by @yetudada that, despite the changes proposed in #2971, it is still cumbersome to instantieate the OmegaConfigLoader with a single file:

catalog_conf = OmegaConfigLoader(conf_source=".")["catalog"]

Compared to

catalog_conf = OmegaConf.load("catalog.yml")

These two examples are not 100 % equivalent, but that's the point - we still need a simpler way to onboard users to the assumptions we make in our config loader.

It was proposed to add a convenience static method:

catalog_conf = OmegaConfigLoader.load_file("catalog.yml")

Which then nicely complements the use case explored in #2967:

catalog = DataCatalog.from_config(OmegaConfigLoader.load_file("catalog.yml"))

Thoughts @noklam @merelcht @idanov?

@idanov
Copy link
Member

idanov commented Oct 2, 2023

Still not very convinced by this, I think adding arbitrary helper methods to the API of classes is definitely a direction which will make things inconsistent and confusing over the long run. I highly doubt that changing the API to provide an odd load_file helper significantly improves the user experience to such an extend that we have to pay the price for inconsistency.

catalog = DataCatalog.from_config(OmegaConfigLoader.load_file("catalog.yml"))

vs

catalog = DataCatalog.from_config(OmegaConfigLoader(conf_source=".").get("catalog"))

Let's not forget that in the current Kedro design, we aim to make no distinction between file-based configurations and configuration taken from a remote system, e.g. a database or an API. What would the load_file alternative for those situations will be is unclear from the suggested design here.

Could we make a step back here and explore alternative roads for solving this problem? One way to go forward is to provide a micro-framework approach as taken by https://github.com/ellwise/kedro-light. Here a full micro-framework approach is taken, where each of the workflow steps is considered and eventually re-designed to fit a more notebook friendly workflow, instead of relying on the session and the context. This requires no API changes to the individual components, but rather creating a micro-framework module with helper methods.

@astrojuanlu
Copy link
Member Author

Quick comment, OmegaConfigLoader does not follow ConfigLoader get() spec #3092 I'll address the rest later if I can (others feel free to chime in)

@idanov
Copy link
Member

idanov commented Oct 2, 2023

@astrojuanlu OmegaConfigLoader extends AbstractConfigLoader which extends UserDict, which has .get(). And #3092 is on purpose, so we don't shadow the UserDict method .get().

@yetudada
Copy link
Contributor

yetudada commented Oct 3, 2023

I'm going to mention my thoughts here and why I raised more questions. So during our recent showcase, we showed that if I had the following structure:

my-project
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
├── catalog.yml
└── requirements.txt

Then I'd be able to load my local dataset, customers in catalog.yml, using:

from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog

conf_catalog = OmegaConfigLoader(conf_source=".")["catalog"]
catalog = DataCatalog.from_config(conf_catalog, source="../data")

catalog.load("customers").head()

And this prompted a few questions:

  • Should we assume that users need a directory for configuration? This concept is introduced with conf_source.
  • What is the most intuitive way to load relative paths in catalog.yml? source="../data" is really confusing and I'm not sure intuitive naming will help this. Additionally, we would have to teach this concept to new users of Kedro because we only teach Kedro using local data. And, this concept only appears for local data; our experience is not consistent when compared to how we would use s3 or Azure Blob Storage.

So then I looked at OmegaConf and Intake as sources of inspiration, I'm sure they're not the only places we can look and then had more questions:

  • For the directory assumption, what benefits do users get from using our OmegaConfigLoader over OmegaConf if they're loading a single configuration file? The API for OmegaConf is far more intuitive than ours. Should we not just recommend this pattern? What are the pros and cons?
  • How do other libraries handle relative paths in configuration?

@astrojuanlu
Copy link
Member Author

tl;dr: No I don't think a load_file as proposed here is inconsistent (prove me wrong). Adding this one method makes the Config Loader objectively easier to use (which is a small, yet noticeable obstacle that has been spotted in conference talks, internal workshops, and demos to clients) for very little cost (maintaining 1 method vs a whole new mini-framework or launching an exploration of the solution space). I'm pushing back against a quest for purity at the cost of everything else.

Let's not forget that in the current Kedro design, we aim to make no distinction between file-based configurations and configuration taken from a remote system, e.g. a database or an API.

That's only half correct as far as I understand? Yes, OmegaConfigLoader takes configurations from remote systems thanks to fsspec:

file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
elif file_mimetype in (
"application/zip",
"application/x-zip-compressed",
"application/zip-compressed",
):
self._protocol = "zip"
else:
self._protocol = "file"
self._fs = fsspec.filesystem(protocol=self._protocol, fo=conf_source)

But it always relies on directory abstractions, whether local or remote, and definitely not "database or an API". Or at least I don't see how one is supposed to extract config in Kedro from a database or an API, and we have zero examples on our docs on how to do such a thing. I'd be happy to be proven wrong.

What would the load_file alternative for those situations will be is unclear from the suggested design here.

For continuity, allowing from_file to use fsspec remote paths is consistent with the rest, because of what I said above. Again, happy to be proven wrong, but looking at the code, I don't see anything that would be possible with __init__(..., conf_source: str) that wouldn't be possible with load_file(..., path: str)1.

I highly doubt that changing the API to provide an odd load_file helper significantly improves the user experience to such an extend that we have to pay the price for inconsistency.

First, "odd" is subjective (let's agree to disagree). Second, I think you are underestimating the cognitive load needed to process

OmegaConfigLoader(
    conf_source="."  # *sigh* okay
).get("catalog")  # what?

for beginners that just learned about Kedro and just want to load catalog.yml. When we explain it in trainings it feels like magic, but we cannot, and should not, depend on guided learning.

The price to pay is maintaining this:

diff --git a/kedro/config/omegaconf_config.py b/kedro/config/omegaconf_config.py
index 5d07cb64..11f8a4b0 100644
--- a/kedro/config/omegaconf_config.py
+++ b/kedro/config/omegaconf_config.py
@@ -73,6 +73,17 @@ class OmegaConfigLoader(AbstractConfigLoader):
 
     """
 
+    @classmethod
+    def load_file(cls, path: str):
+        path_obj = Path(path)
+        config_loader = cls(
+            conf_source=path_obj.parent,
+            config_patterns={path_obj.stem: [path_obj.name]},
+            base_env="",
+            default_run_env="",
+        )
+        return config_loader[path_obj.stem]
+
     def __init__(  # noqa: too-many-arguments
         self,
         conf_source: str,

Sample code:

>>> OmegaConfigLoader.load_file("../talk-kedro-huggingface/catalog.yml")
{...}

One way to go forward is to provide a micro-framework approach as taken by https://github.com/ellwise/kedro-light.

This sounds to me like saying "Kedro is too complicated, so I'm going to show you something else (that you will have to install separately) so that you don't have to deal with those complications". I think this is the wrong message to send, and seems to go against Kedro Principle 2 "2. Grow beginners into experts 🌱".

If anything, we should make Kedro easier to use without resorting to extra layers that add more indirection.

Just from a practical standpoint, if we decide to go down this path, we'd have to design it (months), build it (weeks), change our documentation (weeks), let alone maintain it (forever), and fix bugs for it (forever). I don't think we can afford the luxury of spending time and resources on this.

This is a big discussion for a small issue that deserves a small solution.

Footnotes

  1. Except, of course, merging different configuration files, but the whole point of this thread is to "Make it easier to use the Config Loader" for people who only have one config file.

@astrojuanlu
Copy link
Member Author

@yetudada copied your questions on DataCatalog over to #2965 to keep this thread about OmegaConfigLoader

@astrojuanlu
Copy link
Member Author

The parent issue is #3029

@Galileo-Galilei
Copy link
Contributor

Galileo-Galilei commented Oct 8, 2023

I totally agree with @idanov both affirmations here:

I highly doubt that changing the API to provide an odd load_file helper significantly improves the user experience to such an extend that we have to pay the price for inconsistency.

catalog = DataCatalog.from_config(OmegaConfigLoader.load_file("catalog.yml"))

vs

catalog = DataCatalog.from_config(OmegaConfigLoader(conf_source=".").get("catalog"))

I don't feel the first one is significantly self explanatory to have user writing it straight out of the box without looking at the documentation and I feel it is quite bad to introduce inconsistency.

That said, it is clear that the whole problem boils down to the user experience: it is not intuitive to know how to load a file from the config loader.

First, "odd" is subjective (let's agree to disagree). Second, I think you are underestimating the cognitive load needed to process

OmegaConfigLoader(
    conf_source="."  # *sigh* okay
).get("catalog")  # what?

I think we can break this in several steps:

  1. specifying the conf_source is a bit confusing if you don't know that we expect configuration to be in a folder. I think we'll increase clarity if users do not need to specify this argument.

I remember someone made the suggestion (but I can't find the relevant issue) to :

  • change default of the config_loader to "." for conf_source, defaut_base_env, local_env
  • change default settings.py to add CONF_SOURCE="conf", DEFAULT_BASE_ENV="base", LOCAL_ENV="local_env"
    This is backward compatible, and enable users to use the simplified API:

catalog = DataCatalog.from_config(OmegaConfigLoader().get("catalog"))

  1. I think the main problem is that is is very confusing that a OmegaConfigLoader instance behaves like a dictionary, hence I should use the get method or the even more confusing [ built-in method. Even if I know that it is a dictionary, I dont know (as a user) what its keys are. I think there are 2 things that can decrease ambiguity:
  • communicate on error : if someone does catalog = DataCatalog.from_config(OmegaConfigLoader().get("catalog.yml")), they should get a clear error like "did you mean "catalog" instead of "catalog.yml"``
  • alias get with a more explicit error name : maybe we can add an alias "load" method which only calls get under the hood? And maybe we can make it accept the ".yml" extension?
# very dirty example
class OmegaConfigLoader():
    ...
    def load(filename): 
        conf_key=filename.split(".yml")[0] # optional, we can also jsut alias get without enabling ``.yml`` extension to reduce confusion
        return self.get(conf_key)

hence this will work : catalog = DataCatalog.from_config(OmegaConfigLoader().load("catalog.yml")).

I dont really like this solution however, because it goes against the zen of Python

There should be one-- and preferably only one --obvious way to do it.

@astrojuanlu
Copy link
Member Author

Here's evidence of a user who just tried to do conf_loader = ConfigLoader(config_file="catalog/usr/src/app/conf/base/catalog/Registration.yml") https://linen-slack.kedro.org/t/15970029/hello-all-anyone-knows-how-i-can-feed-my-project-catalogue-t#7f1a2a44-da74-4274-8410-6c3ecb33dbf8 cc @ankatiyar @rosanajurdi

I remember someone made the suggestion (but I can't find the relevant issue) to : [...]

Yes, that's being tracked in #2971 and there's overall agreement to move forward with the proposal, although only for base_env and default_run_env. As far as I understand, conf_source would still be required.

I think the main problem is that is is very confusing that a OmegaConfigLoader instance behaves like a dictionary, hence I should use the get method or the even more confusing [ built-in method. Even if I know that it is a dictionary, I dont know (as a user) what its keys are.

Thanks for sharing, good insight. I agree the .load method that you described wouldn't be an ideal solution.

I'm going back to square one: what other ideas do we have that (1) address the well documented fact that users want to load individual config files in an easy way (2) by making a small incremental addition to what we already have in a way that we can fit it for 0.19?

@merelcht
Copy link
Member

Maybe I'm missing something, but OmegaConfigLoader(conf_source=...)[...] already allows you to load just one file. If you have only one catalog or parameters or credentials file the above will load that one file. IMO solving this for when users have more than e.g. 1 catalog and want to load a specific catalog file, isn't a "getting started" use case anymore. I'd prefer to not rush decisions along those lines for 0.19.0 and do a proper API design workstream post 0.19.0.

In the external user session we briefly touched on differentiating between loading and resolving configuration, which I think is the path we should pursue.

@idanov
Copy link
Member

idanov commented Oct 20, 2023

@astrojuanlu

No I don't think a load_file as proposed here is inconsistent (prove me wrong).

Easy, the interface has zero assumptions about folders, files or anything like that https://github.com/kedro-org/kedro/blob/main/kedro/config/abstract_config.py

QED 😂

Jokes aside, implementing a load_file will make it inconsistent from the base class interface-wise, so when someone encounters another subclass of AbstractConfigLoader which is actually using a database to keep the config (this is what I meant about not assuming files), they will not understand why that method is only in OmegaConfLoader. Hope that makes my point clearer.

@astrojuanlu

This sounds to me like saying "Kedro is too complicated, so I'm going to show you something else (that you will have to install separately) so that you don't have to deal with those complications".

I didn't mean to install it separately, but rather include a module in Kedro which will give you a set of lightweight functions and use that plugin as an inspiration. It's just a wild idea, which is worth exploring imho.

@Galileo-Galilei

I remember someone made the suggestion (but I can't find the relevant issue) ...

I think this is coming in for 0.19, but it's not a "non-breaking" one, because any scripts relying on different defaults will break. But that's not a problem, because 0.19 is just around the corner.

As for your point 2., there's a pretty extensive documentation when using the class and it should popup easily for you. Maybe we need to move it to the constructor (if not easy to see)? https://github.com/kedro-org/kedro/blob/main/kedro/config/omegaconf_config.py#L26-L77

The goal of this move was to make the rest of Kedro unattached to a particular implementation, because essentially we need a dictionary with different keys. This makes it super easy to plugin developers, because your only interface is a key in a dictionary, regardless where the content for that key comes from.

If we go back to the original point for the long discussion here, when people only want to use one file, why don't they just use the following then?

from omegaconf import OmegaConf
catalog = DataCatalog.from_config(OmegaConf.load('catalog.yaml'))

The only point of config loaders in Kedro is to simplify situations where:

  • Load the catalog configuration (initially spread across different files, now just associated with a key name)
  • Load parameters (initially spread across different files, now just associated with a key name)
  • Provide an out-of-the-box solution for environment-separated configuration
  • Provide an out-of-the-box solution to inject credentials

When you don't need any of that, you shouldn't use Kedro's config loader. OmegaConf is a pretty good library to load arbitrary YAML files, if that's what people need.

@astrojuanlu
Copy link
Member Author

Outcome of the conversation we had @merelcht , @idanov and @yetudada :

  • We'll support the workflow of loading a single file but not by changing Kedro's ConfigLoader; when a single file is needed then we'll recommend use of OmegaConf. Kedro's ConfigLoader will keep the assumption that we have a conf folder.
  • The documentation will need to cover:
    • Why is OmegaConf different to Kedro's ConfigLoader and when to use either one?
    • How to load a single file? OmegaConf
    • How to load a data catalog when you have a credentials? Kedro
    • How to load a data catalog when you have templating? Kedro

@astrojuanlu
Copy link
Member Author

Decision turned into #3247, closing this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

7 participants