Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the environment default for ConfigLoader #2971

Closed
noklam opened this issue Aug 24, 2023 · 13 comments · Fixed by #3311
Closed

Remove the environment default for ConfigLoader #2971

noklam opened this issue Aug 24, 2023 · 13 comments · Fixed by #3311
Assignees

Comments

@noklam
Copy link
Contributor

noklam commented Aug 24, 2023

Context

See #2819 and more

Currently ConfigLoader assume base and local exist, but for single config use cases ( a notebook) it's not relevant.

Requirements

  • Backward compatible
  • Enable use of OmegaConfigLoader without environment

Current workaround

OmegaConfigLoader(conf_source=".", base_env=".", default_run_env=".")  # To use this in a notebook

Potential Options:

  1. Change defaults to "." and change settings.py default to "base" and "local" (and starters)
  2. Change defaults to "." and change KedroSession.create
settings.CONFIG_LOADER_CLASS
return config_loader_class(
    conf_source=self._conf_source,
    env=env,
    runtime_params=extra_params,
    **settings.CONFIG_LOADER_ARGS,
)
# Maybe update this in `session.py`?
config_loader_args = {"base_env"="base", "default_run_env"="local"}
config_loader_args.update(settings.CONFIG_LOADER_ARGS) # Overwrite if any
return config_loader_class(
            conf_source=self._conf_source,
            env=env,
            runtime_params=extra_params,
            **config_loader_args,
        )

# Original
settings.CONFIG_LOADER_CLASS
return config_loader_class(
    conf_source=self._conf_source,
    env=env,
    runtime_params=extra_params,
    **settings.CONFIG_LOADER_ARGS,
)

I prefer 2. and we can do it now instead of waiting 0.19 cc @merelcht

Originally posted by @noklam in #2819 (comment)

Option 3:
Use _ProjectSettings to bake framework assumption.

_SESSION_STORE_ARGS = Validator("SESSION_STORE_ARGS", default={})
_DISABLE_HOOKS_FOR_PLUGINS = Validator("DISABLE_HOOKS_FOR_PLUGINS", default=tuple())
_CONFIG_LOADER_CLASS = _HasSharedParentClassValidator(
"CONFIG_LOADER_CLASS", default=_get_default_class("kedro.config.ConfigLoader")
)
_CONFIG_LOADER_ARGS = Validator("CONFIG_LOADER_ARGS", default={})

Additionally

We should clean up the test for config when local isn't a necessary requirement. This can be consider as a separate ticket but probably best to do it together.

@astrojuanlu astrojuanlu added this to the Make it easier to use Kedro as a library milestone Aug 29, 2023
@noklam
Copy link
Contributor Author

noklam commented Aug 30, 2023

We should also make sure it works even if user provide a relative path. To do that we may need to do a .resolve() behind the scene.

#2819 (comment) Additionally, we may always do a resolve in ConfigLoader so even if user provide a relative path it would work. I will add this comment to #2971, depends on the implementation we may avoid this completely with the new root argument.

This ticket will remains as a clean up ticket.
#2912 (comment)

@noklam noklam modified the milestones: Make it easier to use Kedro as a library, Using Kedro with existing projects Sep 1, 2023
@noklam
Copy link
Contributor Author

noklam commented Sep 4, 2023

ValueError: Duplicate keys found in /Users/Nok_Lam_Chan/tmp/pandas-iris/conf/local/parameters.1.yml and /Users/Nok_Lam_Chan/tmp/pandas-iris/conf/base/parameters.1.yml: random_state, target_column, train_fraction

The error is confusing if user define a base_env="." and default_run_env="local". This happen because "." will catch both base and local due to the pattern **/catalog*. We should improve the error message to highlight the current environment. In addition, we can make it more explicited, i.e. Duplicated keys are not allowed in the same environment

Noted that duplicate value from other environment is OKAY, this is the purpose of the environment to override stuff like credentials.

@noklam
Copy link
Contributor Author

noklam commented Sep 4, 2023

Can we have a None default_run_env to “turn off” the local environment feature?

On the other hand, can we remove the assumption that users will have two environment all the time? (For standalone use case, framework will not be changed). The above error message can point to an more explicit error, the solution is still a bit "hacky" as we simply set base == default_run_env.

Could we make it work with a single environment? (framework will default to use base and local, no change) @idanov

Related to this issue:

@merelcht
Copy link
Member

merelcht commented Sep 5, 2023

I'm in favour of removing the base and local defaults from the OmegaConfigLoader and instead adding these in the session (@noklam's second option described on the ticket).

One concern I have though is around the user experience when moving from just using the OmegaConfigLoader as library component to then using the parts of the framework where the session comes in, e.g. %reload_kedro line magic, session.run() and kedro run from the CLI. When users use the library component there will not be any assumptions about configuration environments anymore after this change, but when a user then switches to using more framework stuff they'll find this assumption inside the session. They can update settings.py to set the base_env and default_run_env to something else, but it might be confusing that the setup they used before doesn't immediately work with the full framework. I see two options to solve this:

  • Option 1: Add a very descriptive migration guide from using the library components to then using the full framework which includes the session + elaborate the chapter on configuration environments
  • Option 2: Remove the assumption of having two configuration environment by default all together, which is what @noklam mentioned in the comment above. A user would be able to enable config environments through e.g. settings.py, but by default they don't have to change anything.

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Option 3: Actually, while writing this I realised, perhaps @noklam 's first option in the ticket description: Change defaults to "." and change settings.py default to "base" and "local" , might be the better user experience when it comes to changing from just library components to using the session.

@astrojuanlu
Copy link
Member

Thanks for the analysis @merelcht

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Is this option (2) among the three you listed?

@merelcht
Copy link
Member

merelcht commented Sep 5, 2023

Thanks for the analysis @merelcht

I have a preference for keeping the two configuration environment setup by default inside the framework, because it's aligned with the best practices that Kedro promises to introduce to ML projects.

Is this option (2) among the three you listed?

I've updated my previous comment to make my though process clearer, but essentially the way I see it is:

  • Option 2: remove the default assumption of two environments, but allow users to enable more environments through settings.py. By default, a project would assume there are no config environments and all config is under the same root.
  • Option 3: keeps the default to two environments, but removes the "base" + "local" assumption from the session and moves it to settings.py, which is more visible to the user. By default the config loader assumes there are no config environments and all config is under the same root (which is the initial change @noklam suggested), but the session and thus framework, would still by default be set to using "base" + "local", which is then visible in settings.py and a user can very easily change it to not have two environments, e.g. set both to "."

It's a slight nuance, but the difference for a user would be that with option 2 there's no assumption at all about configuration environments: not about the default location (base and local) but also not that there must be 2. With option 3 we keep the the default location as well as the assumption that there are 2 environments, to "teach" users about the best practices of using layered configuration.

@noklam
Copy link
Contributor Author

noklam commented Sep 5, 2023

  • Option 1: keep the default to two environments, keep base and local in session but allow changes in settings.py.
    Add a very descriptive migration guide from using the library components to then using the full framework which includes the session + elaborate the chapter on configuration environments.
  • Option 2: remove the default assumption of two environments, but allow users to enable more environments through settings.py. By default, a project would assume there are no config environments and all config is under the same root.
  • Option 3: keeps the default to two environments, but removes the "base" + "local" assumption from the session and moves it to settings.py, which is more visible to the user. By default the config loader assumes there are no config environments and all config is under the same root (which is the initial change @noklam suggested), but the session and thus framework, would still by default be set to using "base" + "local", which is then visible in settings.py and a user can very easily change it to not have two environments, e.g. set both to "."

There are 4 possibles pathways:
a. Framework user
b. Library user
c. Framework -> Library
d. Library -> Framework

  • So far we have been focusing on (a), (c) is properly rare so I will just ignore it.
  • (2) and (3) will favor (b) and (c) more, but they are breaking change to (a)
  • any of these option will enable (a) and (b), the main difference is (d)
  • We try to avoid breaking changes

So I think the key here is how we can help people to transit from library to framework without breaking stuff. It turns into a discussion of whether user "opt-in" to 2 environment or they "opt-out".

I imagine the flow from library -> framework will still involve kedro new most likely. so maybe a variant of (1) is that in settings.py we can show a commented default for base and local?

Do we also want to rename the arguments maybe, does base_env and default_run_env make sense outside framework?

(2) is orthogonal to (1), which add the ability to have an environment-less (or just 1 env)

@astrojuanlu
Copy link
Member

I imagine the flow from library -> framework will still involve kedro new most likely.

I'm not so sure. The moment you start populating your directory, kedro new becomes cumbersome.

@noklam
Copy link
Contributor Author

noklam commented Sep 5, 2023

@astrojuanlu kedro new maybe too rigid, let say it will requires copy and paste the settings.py and pipeline_registry.py or a more automated way (through some kedro command to create these templates). In any case we can included the commented default in the settings.py.

@astrojuanlu
Copy link
Member

To summarize previous discussions, @merelcht 's summary on #3029 (comment) gives the way forward: let's keep the assumption of 2 environments, let's make OmegaConfigLoader blind to our base+local framework choice, and let's make this choice explicit in settings.py.

@noklam
Copy link
Contributor Author

noklam commented Nov 14, 2023

@astrojuanlu Can I ask for an update of this ticket? Which options are we going to implement?

I am slightly confused with this statement since this ticket propose removing the default envs.

let's make OmegaConfigLoader blind to our base+local framework choice, and let's make this choice explicit in settings.py.

  1. Do we remove the default env for OmegaConfigLoader?
  2. Do we change any default settings in framework?
  3. Make the env explicit in settings.py (updating the template basically)

My answer to this is all YES but it will be good to double check. If we only do 3 but not 2, it will be a breaking change (maybe that's fine).

Doing 2 without 3 is fine, it's doesn't change anything but it gives users more information what's the default and easier to change them if needed.

@astrojuanlu
Copy link
Member

  1. Do we remove the default env for OmegaConfigLoader?

Not sure what "remove" means in this case, but the end goal is that the user can do

conf_loader = OmegaConfigLoader(conf_source=".")

(in a REPL or a script) and it works without base and local directories. Hence no need to add base_env=".", default_run_env="." too.

  1. Make the env explicit in settings.py (updating the template basically)

In principle yes. That's what @merelcht said in #3029:

settings.py is a good place to keep the defaults, because it's visible to users and this group has experience with updating that file.

Does it make sense?

Doing 2 without 3 is fine, it's doesn't change anything but it gives users more information what's the default and easier to change them if needed.

As far as I understand, (3) was already decided (see above).

  1. Do we change any default settings in framework?

I'm not fully aware of the implications of doing or not doing this.

If we only do 3 but not 2, it will be a breaking change (maybe that's fine).

That's okay but if we can easily avoid it we should try. It will make upgrading to 0.19 from 0.18 easier, right?

@noklam noklam moved this from To Do to In Progress in Kedro Framework Nov 15, 2023
@noklam noklam linked a pull request Nov 16, 2023 that will close this issue
12 tasks
@noklam noklam moved this from In Progress to In Review in Kedro Framework Nov 16, 2023
@merelcht
Copy link
Member

Completed in #3311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment