Remove the environment default for ConfigLoader #2971

noklam · 2023-08-24T14:52:33Z

Context

See #2819 and more

Currently ConfigLoader assume base and local exist, but for single config use cases ( a notebook) it's not relevant.

Requirements

Backward compatible
Enable use of OmegaConfigLoader without environment

Current workaround

OmegaConfigLoader(conf_source=".", base_env=".", default_run_env=".")  # To use this in a notebook

Potential Options:

Change defaults to "." and change settings.py default to "base" and "local" (and starters)
Change defaults to "." and change KedroSession.create

settings.CONFIG_LOADER_CLASS
return config_loader_class(
    conf_source=self._conf_source,
    env=env,
    runtime_params=extra_params,
    **settings.CONFIG_LOADER_ARGS,
)

# Maybe update this in `session.py`?
config_loader_args = {"base_env"="base", "default_run_env"="local"}
config_loader_args.update(settings.CONFIG_LOADER_ARGS) # Overwrite if any
return config_loader_class(
            conf_source=self._conf_source,
            env=env,
            runtime_params=extra_params,
            **config_loader_args,
        )

# Original
settings.CONFIG_LOADER_CLASS
return config_loader_class(
    conf_source=self._conf_source,
    env=env,
    runtime_params=extra_params,
    **settings.CONFIG_LOADER_ARGS,
)

I prefer 2. and we can do it now instead of waiting 0.19 cc @merelcht

Originally posted by @noklam in #2819 (comment)

Option 3:
Use _ProjectSettings to bake framework assumption.

kedro/kedro/framework/project/__init__.py

Lines 102 to 107 in d3171f1

    
           _SESSION_STORE_ARGS = Validator("SESSION_STORE_ARGS", default={}) 
        
           _DISABLE_HOOKS_FOR_PLUGINS = Validator("DISABLE_HOOKS_FOR_PLUGINS", default=tuple()) 
        
           _CONFIG_LOADER_CLASS = _HasSharedParentClassValidator( 
        
               "CONFIG_LOADER_CLASS", default=_get_default_class("kedro.config.ConfigLoader") 
        
           ) 
        
           _CONFIG_LOADER_ARGS = Validator("CONFIG_LOADER_ARGS", default={})

Additionally

We should clean up the test for config when local isn't a necessary requirement. This can be consider as a separate ticket but probably best to do it together.

The text was updated successfully, but these errors were encountered:

noklam · 2023-08-30T11:29:01Z

We should also make sure it works even if user provide a relative path. To do that we may need to do a .resolve() behind the scene.

#2819 (comment) Additionally, we may always do a resolve in ConfigLoader so even if user provide a relative path it would work. I will add this comment to #2971, depends on the implementation we may avoid this completely with the new root argument.

This ticket will remains as a clean up ticket.
#2912 (comment)

noklam · 2023-09-04T11:05:51Z

ValueError: Duplicate keys found in /Users/Nok_Lam_Chan/tmp/pandas-iris/conf/local/parameters.1.yml and /Users/Nok_Lam_Chan/tmp/pandas-iris/conf/base/parameters.1.yml: random_state, target_column, train_fraction

The error is confusing if user define a base_env="." and default_run_env="local". This happen because "." will catch both base and local due to the pattern **/catalog*. We should improve the error message to highlight the current environment. In addition, we can make it more explicited, i.e. Duplicated keys are not allowed in the same environment

Noted that duplicate value from other environment is OKAY, this is the purpose of the environment to override stuff like credentials.

noklam · 2023-09-04T11:10:20Z

Can we have a None default_run_env to “turn off” the local environment feature?

On the other hand, can we remove the assumption that users will have two environment all the time? (For standalone use case, framework will not be changed). The above error message can point to an more explicit error, the solution is still a bit "hacky" as we simply set base == default_run_env.

Could we make it work with a single environment? (framework will default to use base and local, no change) @idanov

Related to this issue:

Do not fail if conf/local is not present #2593

merelcht · 2023-09-05T09:57:30Z

I'm in favour of removing the base and local defaults from the OmegaConfigLoader and instead adding these in the session (@noklam's second option described on the ticket).

One concern I have though is around the user experience when moving from just using the OmegaConfigLoader as library component to then using the parts of the framework where the session comes in, e.g. %reload_kedro line magic, session.run() and kedro run from the CLI. When users use the library component there will not be any assumptions about configuration environments anymore after this change, but when a user then switches to using more framework stuff they'll find this assumption inside the session. They can update settings.py to set the base_env and default_run_env to something else, but it might be confusing that the setup they used before doesn't immediately work with the full framework. I see two options to solve this:

Option 1: Add a very descriptive migration guide from using the library components to then using the full framework which includes the session + elaborate the chapter on configuration environments
Option 2: Remove the assumption of having two configuration environment by default all together, which is what @noklam mentioned in the comment above. A user would be able to enable config environments through e.g. settings.py, but by default they don't have to change anything.

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Option 3: Actually, while writing this I realised, perhaps @noklam 's first option in the ticket description: Change defaults to "." and change settings.py default to "base" and "local" , might be the better user experience when it comes to changing from just library components to using the session.

astrojuanlu · 2023-09-05T10:40:47Z

Thanks for the analysis @merelcht

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Is this option (2) among the three you listed?

merelcht · 2023-09-05T11:14:52Z

Thanks for the analysis @merelcht

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Is this option (2) among the three you listed?

I've updated my previous comment to make my though process clearer, but essentially the way I see it is:

Option 2: remove the default assumption of two environments, but allow users to enable more environments through settings.py. By default, a project would assume there are no config environments and all config is under the same root.
Option 3: keeps the default to two environments, but removes the "base" + "local" assumption from the session and moves it to settings.py, which is more visible to the user. By default the config loader assumes there are no config environments and all config is under the same root (which is the initial change @noklam suggested), but the session and thus framework, would still by default be set to using "base" + "local", which is then visible in settings.py and a user can very easily change it to not have two environments, e.g. set both to "."

It's a slight nuance, but the difference for a user would be that with option 2 there's no assumption at all about configuration environments: not about the default location (base and local) but also not that there must be 2. With option 3 we keep the the default location as well as the assumption that there are 2 environments, to "teach" users about the best practices of using layered configuration.

noklam · 2023-09-05T12:34:56Z

Option 1: keep the default to two environments, keep base and local in session but allow changes in settings.py.
Add a very descriptive migration guide from using the library components to then using the full framework which includes the session + elaborate the chapter on configuration environments.

Option 2: remove the default assumption of two environments, but allow users to enable more environments through settings.py. By default, a project would assume there are no config environments and all config is under the same root.

Option 3: keeps the default to two environments, but removes the "base" + "local" assumption from the session and moves it to settings.py, which is more visible to the user. By default the config loader assumes there are no config environments and all config is under the same root (which is the initial change @noklam suggested), but the session and thus framework, would still by default be set to using "base" + "local", which is then visible in settings.py and a user can very easily change it to not have two environments, e.g. set both to "."

There are 4 possibles pathways:
a. Framework user
b. Library user
c. Framework -> Library
d. Library -> Framework

So far we have been focusing on (a), (c) is properly rare so I will just ignore it.
(2) and (3) will favor (b) and (c) more, but they are breaking change to (a)
any of these option will enable (a) and (b), the main difference is (d)
We try to avoid breaking changes

So I think the key here is how we can help people to transit from library to framework without breaking stuff. It turns into a discussion of whether user "opt-in" to 2 environment or they "opt-out".

I imagine the flow from library -> framework will still involve kedro new most likely. so maybe a variant of (1) is that in settings.py we can show a commented default for base and local?

Do we also want to rename the arguments maybe, does base_env and default_run_env make sense outside framework?

(2) is orthogonal to (1), which add the ability to have an environment-less (or just 1 env)

astrojuanlu · 2023-09-05T12:52:29Z

I imagine the flow from library -> framework will still involve kedro new most likely.

I'm not so sure. The moment you start populating your directory, kedro new becomes cumbersome.

noklam · 2023-09-05T13:09:08Z

@astrojuanlu kedro new maybe too rigid, let say it will requires copy and paste the settings.py and pipeline_registry.py or a more automated way (through some kedro command to create these templates). In any case we can included the commented default in the settings.py.

astrojuanlu · 2023-10-11T09:24:33Z

To summarize previous discussions, @merelcht 's summary on #3029 (comment) gives the way forward: let's keep the assumption of 2 environments, let's make OmegaConfigLoader blind to our base+local framework choice, and let's make this choice explicit in settings.py.

noklam · 2023-11-14T07:20:50Z

@astrojuanlu Can I ask for an update of this ticket? Which options are we going to implement?

I am slightly confused with this statement since this ticket propose removing the default envs.

let's make OmegaConfigLoader blind to our base+local framework choice, and let's make this choice explicit in settings.py.

Do we remove the default env for OmegaConfigLoader?
Do we change any default settings in framework?
Make the env explicit in settings.py (updating the template basically)

My answer to this is all YES but it will be good to double check. If we only do 3 but not 2, it will be a breaking change (maybe that's fine).

Doing 2 without 3 is fine, it's doesn't change anything but it gives users more information what's the default and easier to change them if needed.

astrojuanlu · 2023-11-14T08:41:11Z

Do we remove the default env for OmegaConfigLoader?

Not sure what "remove" means in this case, but the end goal is that the user can do

conf_loader = OmegaConfigLoader(conf_source=".")

(in a REPL or a script) and it works without base and local directories. Hence no need to add base_env=".", default_run_env="." too.

Make the env explicit in settings.py (updating the template basically)

In principle yes. That's what @merelcht said in #3029:

settings.py is a good place to keep the defaults, because it's visible to users and this group has experience with updating that file.

Does it make sense?

Doing 2 without 3 is fine, it's doesn't change anything but it gives users more information what's the default and easier to change them if needed.

As far as I understand, (3) was already decided (see above).

Do we change any default settings in framework?

I'm not fully aware of the implications of doing or not doing this.

If we only do 3 but not 2, it will be a breaking change (maybe that's fine).

That's okay but if we can easily avoid it we should try. It will make upgrading to 0.19 from 0.18 easier, right?

merelcht · 2023-11-21T15:19:41Z

Completed in #3311

astrojuanlu added this to the Make it easier to use Kedro as a library milestone Aug 29, 2023

astrojuanlu added this to Kedro Framework Aug 29, 2023

noklam mentioned this issue Aug 30, 2023

Replace .absolute() with .resolve() #2912

Closed

github-actions bot mentioned this issue Sep 1, 2023

Monthly issue metrics report #2996

Closed

noklam modified the milestones: Make it easier to use Kedro as a library, Using Kedro with existing projects Sep 1, 2023

merelcht mentioned this issue Sep 13, 2023

Make it easier to use Kedro as a library #3029

Closed

noklam mentioned this issue Sep 22, 2023

Publish a four-part mini-blog series about Notebook issues and how Kedro can resolve them kedro-org/kedro-devrel#80

Closed

This was referenced Oct 2, 2023

Make it easier to use the Config Loader #2819

Closed

[READY] Add new notebook example to docs about adding Kedro to a notebook project #3128

Merged

This was referenced Oct 23, 2023

Do not fail if conf/local is not present #2593

Closed

Using Kedro with existing project? #410

Closed

merelcht moved this to To Do in Kedro Framework Nov 1, 2023

merelcht assigned noklam Nov 9, 2023

noklam moved this from To Do to In Progress in Kedro Framework Nov 15, 2023

noklam mentioned this issue Nov 15, 2023

Moved the environment assumption from ConfigLoader to _ProjectSettings #3311

Merged

12 tasks

noklam linked a pull request Nov 16, 2023 that will close this issue

Moved the environment assumption from ConfigLoader to _ProjectSettings #3311

Merged

12 tasks

noklam moved this from In Progress to In Review in Kedro Framework Nov 16, 2023

noklam removed a link to a pull request Nov 16, 2023

Moved the environment assumption from ConfigLoader to _ProjectSettings #3311

Merged

12 tasks

noklam linked a pull request Nov 16, 2023 that will close this issue

Moved the environment assumption from ConfigLoader to _ProjectSettings #3311

Merged

12 tasks

merelcht closed this as completed Nov 21, 2023

github-project-automation bot moved this from In Review to Done in Kedro Framework Nov 21, 2023

merelcht mentioned this issue Nov 21, 2023

Explicitly set the environment defaults in settings.py #3326

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the environment default for ConfigLoader #2971

Remove the environment default for ConfigLoader #2971

noklam commented Aug 24, 2023 •

edited

Loading

noklam commented Aug 30, 2023

noklam commented Sep 4, 2023

noklam commented Sep 4, 2023 •

edited

Loading

merelcht commented Sep 5, 2023 •

edited

Loading

astrojuanlu commented Sep 5, 2023

merelcht commented Sep 5, 2023 •

edited

Loading

noklam commented Sep 5, 2023

astrojuanlu commented Sep 5, 2023

noklam commented Sep 5, 2023

astrojuanlu commented Oct 11, 2023

noklam commented Nov 14, 2023

astrojuanlu commented Nov 14, 2023

merelcht commented Nov 21, 2023

Remove the environment default for ConfigLoader #2971

Remove the environment default for ConfigLoader #2971

Comments

noklam commented Aug 24, 2023 • edited Loading

Context

Requirements

Current workaround

Potential Options:

Additionally

noklam commented Aug 30, 2023

noklam commented Sep 4, 2023

noklam commented Sep 4, 2023 • edited Loading

merelcht commented Sep 5, 2023 • edited Loading

astrojuanlu commented Sep 5, 2023

merelcht commented Sep 5, 2023 • edited Loading

noklam commented Sep 5, 2023

astrojuanlu commented Sep 5, 2023

noklam commented Sep 5, 2023

astrojuanlu commented Oct 11, 2023

noklam commented Nov 14, 2023

astrojuanlu commented Nov 14, 2023

merelcht commented Nov 21, 2023

noklam commented Aug 24, 2023 •

edited

Loading

noklam commented Sep 4, 2023 •

edited

Loading

merelcht commented Sep 5, 2023 •

edited

Loading

merelcht commented Sep 5, 2023 •

edited

Loading