Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TemplatedConfigLoader - Default ${KEDRO_ENV} global & expose anyconfig.load kwargs #1438

Closed
wants to merge 21 commits into from

Conversation

datajoely
Copy link
Contributor

@datajoely datajoely commented Apr 13, 2022

FOR DISCUSSION

Description

This is a culmination of several things I've seen users struggle with, both related to TemplatedConfigLoader.

1. Automatically scope the config environment as a ${KEDRO_ENV}

Since 0.18.x it is no longer possible to neatly pass the current configuration environment to the TemplatedConfigLoader. This used to be possible via the register_config_loader hook that had the env argument in scope and was thus easy to pass into scope. @pascalwhoop has raised this internally, but we have no GH issue associated with it yet.

2. Allowing the user to define arbitrary arguments to anyconfig.load

We currently leverage anyconfig.load to open our configuration files, the ac_template argument was exposed in 0.17.0, but there are other arguments available as per the anyconfig API docs. I see no reason not to expose this in a generic way allowing the user to alter the behaviour to their choosing.

Development notes

To make this possible I've changed:

  • The ac_template parameter that's passed down in several places to a dictionary that contains this (and in theory any other arguments) to be passed to anyconfig.load.
  • The env argument is added automatically to self._config_mapping within the constructor.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

datajoely added 3 commits April 13, 2022 13:38
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely datajoely added the Issue: Feature Request New feature or improvement to existing feature label Apr 13, 2022
@datajoely datajoely self-assigned this Apr 13, 2022
datajoely added 3 commits April 13, 2022 16:32
…ated-config-loader

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely
Copy link
Contributor Author

Docs look like this when rendered
image

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments on the code, let me comment on the higher level idea here separately...

kedro/config/common.py Outdated Show resolved Hide resolved
kedro/config/common.py Outdated Show resolved Hide resolved
kedro/config/common.py Outdated Show resolved Hide resolved
kedro/config/templated_config.py Outdated Show resolved Hide resolved
kedro/config/templated_config.py Outdated Show resolved Hide resolved
@antonymilne
Copy link
Contributor

antonymilne commented Apr 14, 2022

I have mixed thoughts on this. I totally understand the need for both these features and think both your solutions are improvements on our current state. The question in my mind is whether they would fit in well with the ultimate solution to #891 and what a realistic timeframe for that ultimate solution would be. e.g. I wouldn't want everyone to start using ${KEDRO_ENV} in their config files only for that to become impossible in the future.

1. Automatically the config environment as a ${KEDRO_ENV}

Isn't this the github issue that tracks it? #1411. I suggested two current workarounds, but building this into the TemplatedConfigLoader natively like you've done here makes sense to me. My hesitation is as above, whether this will continue to be the right solution.

2. Allowing the user to define arbitrary arguments to anyconfig.load

This is basically what I suggested in #1011 and so obviously I like it 😀 Looking at it again I'm not sure it's right to expose anyconfig_args in the TemplatedConfigLoader.__init__ though. I think it's probably lower footprint to make the dictionary accessible to override through a custom MyTemplatedConfigLoader like this instead:

class MyTemplatedConfigLoader(TemplatedConfigLoader):
    def __init__(*args, **kwargs):
        super().__init__(*args, **kwargs)
        self._anyconfig_args = {"Loader": yaml.UnsafeLoader}

This is a bit more effort for users than being able to specify anyconfig_args directly in CONFIG_LOADER_ARGS in settings.py but means (a) we keep the signature of TemplatedConfigLoader a bit cleaner and (b) we don't need to worry about breaking backwards compatibility in the future if we wanted to remove the argument and implement this another way. Given how few people would want to change anyconfig_args from their defaults, I'm not sure it's worth a whole new user-facing argument. This implementation seems like an easy win to me since it immediately gives users more power for how anyconfig.load operates for very little work and risk of breaking changes on our side.

@datajoely
Copy link
Contributor Author

I'll save these for the next backlog grooming - but regarding the exposed anyconfig_args I've had two different discord users who wanted to include the contents of globals.yml within the jinja context, I think it would be pretty useful.

@antonymilne
Copy link
Contributor

Yeah, that makes sense, but I think it's still possible without altering the signature of TemplatedConfigLoader.__init__ itself:

class MyTemplatedConfigLoader(TemplatedConfigLoader):
    def __init__(*args, **kwargs):
        super().__init__(*args, **kwargs)
        self._anyconfig_args = {"Loader": yaml.UnsafeLoader, "whatever_arg_you_need": kwargs["globals_dict"]} # or `self._config_mapping` or whatever

I like the idea of generalising to anyconfig_args, just I think it probably should not go in the signature.

@datajoely
Copy link
Contributor Author

datajoely commented Apr 19, 2022

Okay I'm sold - will update to match shortly

datajoely added 2 commits April 19, 2022 19:03
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
…ated-config-loader

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely
Copy link
Contributor Author

Updated docs as per the latest changes
image

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on second thoughts this won't work quite right following my suggestion, sorry... Because self._anyconfig_args is already used in L127 of TemplatedConfigLoader.__init__, if you do this:

class MyTemplatedConfigLoader(TemplatedConfigLoader):
    def __init__(*args, **kwargs):
        super().__init__(*args, **kwargs)
        self._anyconfig_args = {"Loader": yaml.UnsafeLoader}

then self._anyconfig_args only gets overwritten after it's used on L127.

I don't know if this really matters since, as I understand it, this is not the main important call to _get_config_from_patterns (that is in conf_paths). But if it does matter we might need to do something with properties to make this work properly.

Either way, these are implementation details, and my general comments still stand (general approval of at least idea 2). Would be good to hear what other people think about it before we spend longer figuring out the right implementation.

kedro/config/common.py Outdated Show resolved Hide resolved
kedro/config/common.py Outdated Show resolved Hide resolved
kedro/config/templated_config.py Outdated Show resolved Hide resolved
datajoely and others added 4 commits April 20, 2022 10:16
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely datajoely changed the title Improve TemplatedConfigLoader to include environment & accept arbitrary anyconfig.load arguments Improve TemplatedConfigLoader - Default ${KEDRO_ENV} global & expose anyconfig.load kwargs Apr 20, 2022
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely
Copy link
Contributor Author

I've also added a fix to an regression raised by @sheldontsen-qb where the globals_pattern argument to TemplatedConfigLoader was being split it, the method used to accept Optional[str] which would then split any glob path passed into to a list of characters due to the list() constructor highlighted below:

self._config_mapping = (
      _get_config_from_patterns(
          conf_paths=self.conf_paths,
          patterns=list(globals_pattern),  # <---------
          ac_template=False,
      )
      if globals_pattern
      else {}
  )

This gets exploded into in this: ['*', 'g', 'l', 'o', 'b', 'a', 'l', 's', '.', 'y', 'm', 'l']. I think this passed our test suite because the pattern * is a superset of *globals.yml! To resolve this I've corrected the type hint and removed the cast to list().

@datajoely datajoely marked this pull request as ready for review April 20, 2022 13:22
@sheldontsen-qb
Copy link
Contributor

sheldontsen-qb commented Apr 20, 2022

Noticing the jinja2 and passing any extra anyconfig_kwargs, we also do the following:

        # add environment variables
        self._ctx["env"] = dict(os.environ)

        # if anyconfig throws a warning from templates,
        # so catch this as an error within a context manager
        # and reset warnings after exiting context manager.
        with warnings.catch_warnings():
            warnings.filterwarnings(action="error", module="anyconfig")
            parsed_result = {
                k: v
                for k, v in anyconfig.load(
                    config_file,
                    ac_template=True,
                    paths=filtered_conf_path_modified,  # <---- location of where to find templates which should be configurable
                    ac_context=self._ctx,
                ).items()
                if not k.startswith("_")
            }

        return parsed_result

Notice we just dump everything in env by default - because why not, secrets are usually managed by env vars anyway. Now you have a consistent way to reference any env var in catalogs/parameters. Separately we also have a place where we load credentials.yml and chuck them into os.environ (which runs before this run so it works out):

    def _setup_env_variables(self) -> None:
        """Configure environmental variables specified in `credentials.yml`."""
        try:
            credentials = self.config_loader.get("credentials*", "credentials*/**")
            for key, value in credentials.items():
                os.environ[key] = value
                logging.info(
                    "** Kedro project %s is configured in environmental variables",
                    str(key),
                )
        except:  # pylint: disable=bare-except # noqa
            pass

Not sure how this would change in 0.18 but its nice to funnel everything in. Everything just works.

Notice we're catching warnings and raising them as errors - because anyconfig has a try...except inside somewhere which gives a red-herring error. This will help surface the actual rendering issue rather than anyconfig just passing through silently and giving you a malformed error.

Regarding the comment of including globals.yml into the jinja context, I think it might be better to have 2 separate config loaders and not mix syntax. So a TemplatedConfigLoader and a Jinja2ConfigLoader, either go with globals.yml and the ${} syntax, or application_vars.j2 with the {{ }} syntax (don't have to settle with that name, but you need something equivalent).

Note it is possible to build it in a way that you can support both, but you'd need to properly document which comes first (jinja2 first resolves then globals) but I think it would be messy to see both kinds of syntax. Hence the preference to keep them separate. However, it is possible to build this in a backwards compatible way so one can slowly transition away from ${} to {{ }} over time instead of one big PR. But then one perhaps should consider providing guidance on this.

Just sharing my two cents!

@antonymilne
Copy link
Contributor

@sheldontsen-qb great spot with the bug report 😬 I'll add this to #1437.

@datajoely I'm not sure that's the right fix. In 0.17.7 (and I guess always before that) we had globals_pattern: Optional[str] = None and I don't think it was intentional to make it a list in 0.18.0 (although arguably it should have always been a list I think). Unless we want to change the signature for TemplatedConfigLoader the right thing to do would be to call _get_config_from_patterns with patterns=[globals_pattern] instead.

datajoely and others added 6 commits April 20, 2022 14:59
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
…hub.com/quantumblacklabs/kedro into feature/improve-templated-config-loader

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
…hub.com/quantumblacklabs/kedro into feature/improve-templated-config-loader

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@datajoely
Copy link
Contributor Author

@AntonyMilneQB good point - I've tweaked it so the signature is maintained.

@sheldontsen-qb 100% agree on the Jinja / templated syntax workflow distinction. I'm in favour of introducing this tweak purely because enough open source users keep asking for this and it makes sense as a stop-gap solution until we introduce the complete config overhaul via #891 .

@datajoely
Copy link
Contributor Author

After discussion with core team - will close this PR and raise a couple of issues

@datajoely datajoely closed this Apr 27, 2022
@merelcht merelcht deleted the feature/improve-templated-config-loader branch May 4, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants