Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass the env to config loader since registration moved to settings.py in 0.18.x #1411

Closed
datajoely opened this issue Apr 4, 2022 · 9 comments

Comments

@datajoely
Copy link
Contributor

Description

In 0.17.x we would register the config loader like so:

def def register_config_loader(
        self, conf_paths: Iterable[str], env: str, extra_params: Dict[str, Any]
    ) -> ConfigLoader:
    return TemplatedConfigLoader(
        conf_paths,
        globals_pattern="*globals.yml",
        globals_dict= {"env": env},
    )

In 0.18.x we can register a configloader like this:

from kedro.config import TemplatedConfigLoader  # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = { 'globals_pattern':"*globals.yml"}

However we can no longer retrieve the runtime environment in this manner? (I may be missing something)

Context

How has this bug affected you? What were you trying to accomplish?

When using TemplatedConfigLoader this is a common pattern to make sure paths are explicit.

Possible solutions:

  • Kedro could automatically include $kedro_env in the globals_dict / self._config_mapping namespace?
  • You can get creative with the KEDRO_ENV environment variable, but it is a fragile solution
@datajoely
Copy link
Contributor Author

@pascalwhoop

@antonymilne
Copy link
Contributor

antonymilne commented Apr 5, 2022

You're not missing anything: this is indeed no longer possible. I believe it's a deliberate casualty of moving library component customisation from registration hooks into settings.py. Similarly injecting credentials into register_data_catalog is no longer possible - see #1282.

There are two possible ways to do this now:

  1. Just manually define env in the globals.yml file in each environment.
  2. Write a simple custom config loader which injects the variable:
class MyTemplatedConfigLoader(TemplatedConfigLoader):
    def __init__(*args, **kwargs):
        super().__init__(*args, **kwargs)
        self._config_mapping["env"] = self.env

Personally I'm not a big fan either of these, and I think your idea of automatically including env as a variable in TemplatedConfigLoader sounds like a sensible thing to do. The decision in #1282, and I imagine for this issue too, would be that the ultimate solution should be considered as part of the great configuration improvements in #891, which I really hope we will tackle soon.

@Galileo-Galilei
Copy link
Member

For the record, this would solve #506 which is currently broken since get_current_session was deprecated (if we look at the use cases in this issue, what people want to retrieve through the session is almost always the ConfigLoader instance with the right environment).

@pascalwhoop
Copy link
Contributor

pascalwhoop commented Apr 5, 2022

@AntonyMilneQB note that my current attempted workaround (similar to yours above) needs ConfigLoader as a parent class next to TemplatedConfigLoader otherwise I get

dynaconf.validator.ValidationError: Invalid value `semicon_forecast.config_loader.CustomTemplatedConfigLoader` received for setting `CONFIG_LOADER_CLASS`. It must be a subclass of `kedro.config.config.ConfigLoader`.

my current approach is to

class CustomTemplatedConfigLoader(TemplatedConfigLoader):
    """
    A custom config loader that injects `env` as a globals value to be used in parameters files and nodes, 
    as well as any environment variable starting with KEDRO_
    """
    def __init__(self, conf_source, env, runtime_params, **kwargs):
        # grab parameters from env variables and add them to the config loader
        # so that they can be used in the config files
        # conf_paths = ["conf/base", f"conf/{env}", "conf/local"]
        runtime_params = runtime_params or {}
        env_params = self.get_kedro_parameters_from_env_variables()
        globals_dict = {**env_params, **runtime_params, "env": env}
        logging.info("using extra parameters")
        logging.info(env_params)

        super().__init__(
            conf_source,
            env,
            runtime_params,
            globals_pattern="*globals.yml",
            globals_dict=globals_dict,
            **kwargs,
        )

    def get_kedro_parameters_from_env_variables(self):
        env_params = {
            str.lower(k[len("KEDRO_") :]): os.environ.get(k)
            for k in os.environ.keys()
            if k.startswith("KEDRO_")
        }

        return env_params

@pascalwhoop
Copy link
Contributor

In this context, I also noticed that we changed the way that the TemplatedConfigLoader changed the way we calculate config paths to

    def _build_conf_paths(self) -> Iterable[str]:
        run_env = self.env or self.default_run_env
        return [
            str(Path(self.conf_source) / self.base_env),
            str(Path(self.conf_source) / run_env),
        ]

It used to be that you passed a list of config directories to the TemplatedConfigLoader, hence the commented out 3 paths above.

conf_paths: Union[str, Iterable[str]],

But now we just pass

   base_env: str = "base",
    default_run_env: str = "local",

and since default_run_env defaults to localbut gets overwritten by env, if we do

kedro run --env prod

we don't look in conf/local anymore, so any credentials.yml file in there is also not discovered. Patching the above function with

    def _build_conf_paths(self) -> Iterable[str]:
        run_env = self.env or self.default_run_env
        return [
            str(Path(self.conf_source) / self.base_env),
            str(Path(self.conf_source) / run_env),
            str(Path(self.conf_source) /  "local" ),
        ]

resolves the credentials.yml not found issue

@pascalwhoop
Copy link
Contributor

pascalwhoop commented Apr 5, 2022

So I'd propose a PR with the following changes:

  • make TemplatedConfigLoader inherit from ConfigLoader, not AbstractConfigLoader
  • rename default_run_env to local_env and make the env hierarchy be
    1. base
    2. env
    3. local
  • add local_env to the list of envs' in which config_loader looks for
  • docs on how to solve the original question, which is how do I get env into my config loading / parameters

@antonymilne
Copy link
Contributor

antonymilne commented Apr 5, 2022

@pascalwhoop The validation error is definitely an unpleasant bug, sorry - see #1402 which includes a horrible hack workaround fix if you don't want to explicitly inherit from ConfigLoader. This should be fixed soon though.

Regarding the conf_paths change: you're right that the mechanism for specifying this has changed from conf_paths argument to writing a custom _build_conf_paths function. However, AFAIK the kedro config loader hierarchy has always been base < env, where env defaults to local, so there's no change there. I think many people (including myself) are surprised to hear this because they're so used to customising to do base < env < local instead. Should we change the default hierarchy to that? Maybe, but I know that this was considered during the redesign and decided against for some reason. It doesn't seem like a bad idea to me, but it would also apply to the default (untemplated) config loader and would need some discussion I think.

Very small modification to your _build_conf_paths that I'd suggest:

    def _build_conf_paths(self) -> Iterable[str]:
        run_env = self.env or self.default_run_env
        return [
            str(Path(self.conf_source) / self.base_env),
            str(Path(self.conf_source) / run_env),
            str(Path(self.conf_source) / self.default_run_env),
        ]

(note that _remove_duplicates handles the case where env isn't specified or is explicitly given as "local")

@antonymilne
Copy link
Contributor

So overall I'd say, much as we appreciate PRs... 🙂

  • make TemplatedConfigLoader inherit from ConfigLoader, not AbstractConfigLoader

This is not such an easy fix for reasons I explained in #1402 and we're prioritising fixing it already so I'd leave it (or work out the best way to fix it for us 😀)

  • rename default_run_env to local_env and make the env hierarchy be
    1. base
    2. env
    3. local

Personally I think this may well make sense, but I don't recommend doing a PR on it without gathering more opinions on it, since I know it has been considered and then rejected before.

  • add local_env to the list of envs' in which config_loader looks for

Not sure I understand how this is distinct from the previous point?

  • docs on how to solve the original question, which is how do I get env into my config loading / parameters

💯% this would be good, as would pointing towards it in our migration guide given that this is not clearly spelt out anywhere.

@antonymilne
Copy link
Contributor

I'm closing this because we do have a method for passing env to config loader - see my post above: #1411 (comment). Fully appreciate that this is not a great solution though. The issue is very much on our radar and we are absolutely looking for a way to make this easier as part of #1282.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants