Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "runtime_params" resolver to allow overriding of config with CLI params #3036

Merged
merged 14 commits into from
Sep 27, 2023
Merged
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

## Major features and improvements
* Allowed using of custom cookiecutter templates for creating pipelines with `--template` flag for `kedro pipeline create` or via `template/pipeline` folder.
* Allowed overriding of configuration keys with runtime parameters using the `runtime_params` resolver.
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved

## Bug fixes and other changes
* Updated dataset factories to resolve nested catalog config properly.
Expand Down
52 changes: 35 additions & 17 deletions docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@
By default, Kedro is set up to use the [ConfigLoader](/kedro.config.ConfigLoader) class. Kedro also provides two additional configuration loaders with more advanced functionality: the [TemplatedConfigLoader](/kedro.config.TemplatedConfigLoader) and the [OmegaConfigLoader](/kedro.config.OmegaConfigLoader).
Each of these classes are alternatives for the default `ConfigLoader` and have different features. The following sections describe each of these classes and their specific functionality in more detail.

This page also contains a set of guidance for advanced configuration requirements of standard Kedro projects:

* [How to change which configuration files are loaded](#how-to-change-which-configuration-files-are-loaded)
* [How to ensure non default configuration files get loaded](#how-to-ensure-non-default-configuration-files-get-loaded)
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to use Jinja2 syntax in configuration](#how-to-use-jinja2-syntax-in-configuration)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to override configuration with runtime parameters with the `OmegaConfigLoader`](#how-to-override-configuration-with-runtime-parameters-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
* [How to load credentials through environment variables with `OmegaConfigLoader`](#how-to-load-credentials-through-environment-variables)

## OmegaConfigLoader

[OmegaConf](https://omegaconf.readthedocs.io/) is a Python library designed to handle and manage settings. It serves as a YAML-based hierarchical system to organise configurations, which can be structured to accommodate various sources, allowing you to merge settings from multiple locations.
Expand All @@ -23,12 +35,6 @@

CONFIG_LOADER_CLASS = OmegaConfigLoader
```
### Advanced `OmegaConfigLoader` features
Some advanced use cases of `OmegaConfigLoader` are listed below:
- [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
- [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
- [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
- [How to load credentials through environment variables](#how-to-load-credentials-through-environment-variables)

## TemplatedConfigLoader

Expand Down Expand Up @@ -127,16 +133,6 @@

## Advanced Kedro configuration

This section contains a set of guidance for advanced configuration requirements of standard Kedro projects:
* [How to change which configuration files are loaded](#how-to-change-which-configuration-files-are-loaded)
* [How to ensure non default configuration files get loaded](#how-to-ensure-non-default-configuration-files-get-loaded)
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to use Jinja2 syntax in configuration](#how-to-use-jinja2-syntax-in-configuration)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
* [How to load credentials through environment variables](#how-to-load-credentials-through-environment-variables)

### How to change which configuration files are loaded
If you want to change the patterns that the configuration loader uses to find the files to load you need to set the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md).
For example, if your `parameters` files are using a `params` naming convention instead of `parameters` (e.g. `params.yml`) you need to update `CONFIG_LOADER_ARGS` as follows:
Expand Down Expand Up @@ -300,9 +296,31 @@
```yaml
my_param: "${globals: nonexistent_global, 23}"
```
If there are duplicate keys in the globals files in your base and run time environments, the values in the run time environment
If there are duplicate keys in the globals files in your base and runtime environments, the values in the runtime environment
will overwrite the values in your base environment.
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved

### How to override configuration with runtime parameters with the `OmegaConfigLoader`

Kedro allows you to [specify runtime parameters for the `kedro run` command with the `--params` CLI option](parameters.md#how-to-specify-parameters-at-runtime). These runtime parameters
are added to the `KedroContext` and merged with parameters from the configuration files to be used in your project's pipelines and nodes. From Kedro `0.18.14`, you can use the
`runtime_params` resolver to indicate that you want to override values of certain keys in your configuration with runtime parameters provided through the CLI option.

Check warning on line 306 in docs/source/configuration/advanced_configuration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/configuration/advanced_configuration.md#L306

[Kedro.toowordy] 'indicate' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'indicate' is too wordy", "location": {"path": "docs/source/configuration/advanced_configuration.md", "range": {"start": {"line": 306, "column": 30}}}, "severity": "WARNING"}
This resolver can be used across different configuration types, such as parameters, catalog, and more, except for "globals".
Consider this `catalog.yml` file :
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
```yaml
model_options:
random_state: "${runtime_params:random}"
```
This will allow you to pass a runtime parameter named `random` through the CLI to specify the value of `model_options.random_state` in your project's parameters, like so :
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
```bash
kedro run --params="random=3"
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
```
You can also specify a default value to be used in case the runtime parameter is not specified with the `kedro run` command. Consider this catalog entry:
```yaml
companies:
type: pandas.CSVDataSet
filepath: "${runtime_params:folder,'data/01_raw/'}companies.csv"
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
```
If the `folder` parameter is not passed through the CLI `--params` option with `kedro run`, the default value `'data/01_raw/'` is used for the `filepath`.

### How to use resolvers in the `OmegaConfigLoader`
Instead of hard-coding values in your configuration files, you can also dynamically compute them using [`OmegaConf`'s
Expand Down
69 changes: 59 additions & 10 deletions kedro/config/omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

import fsspec
from omegaconf import OmegaConf
from omegaconf.errors import InterpolationResolutionError
from omegaconf.errors import InterpolationResolutionError, UnsupportedInterpolationType
from omegaconf.resolvers import oc
from yaml.parser import ParserError
from yaml.scanner import ScannerError
Expand Down Expand Up @@ -123,6 +123,7 @@ def __init__( # noqa: too-many-arguments
self._register_new_resolvers(custom_resolvers)
# Register globals resolver
self._register_globals_resolver()
self._register_runtime_params_resolver()
file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
Expand All @@ -141,8 +142,12 @@ def __init__( # noqa: too-many-arguments
env=env,
runtime_params=runtime_params,
)
try:
self._globals = self["globals"]
except MissingConfigException:
self._globals = {}

def __getitem__(self, key) -> dict[str, Any]:
def __getitem__(self, key) -> dict[str, Any]: # noqa: PLR0912
"""Get configuration files by key, load and merge them, and
return them in the form of a config dictionary.

Expand Down Expand Up @@ -170,6 +175,9 @@ def __getitem__(self, key) -> dict[str, Any]:
)
patterns = [*self.config_patterns[key]]

if key == "globals":
# "runtime_params" resolver is not allowed in globals.
OmegaConf.clear_resolver("runtime_params")
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
read_environment_variables = key == "credentials"

processed_files: set[Path] = set()
Expand All @@ -178,9 +186,18 @@ def __getitem__(self, key) -> dict[str, Any]:
base_path = str(Path(self.conf_source) / self.base_env)
else:
base_path = str(Path(self._fs.ls("", detail=False)[-1]) / self.base_env)
base_config = self.load_and_merge_dir_config(
base_path, patterns, key, processed_files, read_environment_variables
)
try:
base_config = self.load_and_merge_dir_config(
base_path, patterns, key, processed_files, read_environment_variables
)
except UnsupportedInterpolationType as exc:
if "runtime_params" in str(exc):
raise UnsupportedInterpolationType(
"The `runtime_params:` resolver is not supported for globals."
)
else:
raise exc

config = base_config

# Load chosen env config
Expand All @@ -189,9 +206,19 @@ def __getitem__(self, key) -> dict[str, Any]:
env_path = str(Path(self.conf_source) / run_env)
else:
env_path = str(Path(self._fs.ls("", detail=False)[-1]) / run_env)
env_config = self.load_and_merge_dir_config(
env_path, patterns, key, processed_files, read_environment_variables
)
try:
env_config = self.load_and_merge_dir_config(
env_path, patterns, key, processed_files, read_environment_variables
)
except UnsupportedInterpolationType as exc:
if "runtime_params" in str(exc):
raise UnsupportedInterpolationType(
"The `runtime_params:` resolver is not supported for globals."
)
else:
raise exc
# Re-register runtime params resolver incase it was deactivated
self._register_runtime_params_resolver()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit hard to follow - I am not sure if this is registered or not. If I follow the logic, it assumes that it is always on and only turn off if key==global. Maybe this should move to the beginning instead? Then the __init__ register_runtime_params_resolvers can be removed optionally.

In fact, I found this is inconsistent and I think we should treat this same as the read_environment_variable flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since replace=True is set in _register_runtime_params_resolver(), I didn't add an if condition here. It'll re-register it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to the beginning and removed the call from __init__.

# Destructively merge the two env dirs. The chosen env will override base.
common_keys = config.keys() & env_config.keys()
if common_keys:
Expand All @@ -209,6 +236,7 @@ def __getitem__(self, key) -> dict[str, Any]:
f"No files of YAML or JSON format found in {base_path} or {env_path} matching"
f" the glob pattern(s): {[*self.config_patterns[key]]}"
)

return config

def __repr__(self): # pragma: no cover
Expand Down Expand Up @@ -297,6 +325,7 @@ def load_and_merge_dir_config( # noqa: too-many-arguments
return OmegaConf.to_container(
OmegaConf.merge(*aggregate_config, self.runtime_params), resolve=True
)

return {
k: v
for k, v in OmegaConf.to_container(
Expand All @@ -322,15 +351,22 @@ def _register_globals_resolver(self):
replace=True,
)

def _register_runtime_params_resolver(self):
OmegaConf.register_new_resolver(
"runtime_params",
self._get_runtime_value,
replace=True,
)

def _get_globals_value(self, variable, default_value=_NO_VALUE):
"""Return the globals values to the resolver"""
if variable.startswith("_"):
raise InterpolationResolutionError(
"Keys starting with '_' are not supported for globals."
)
global_omegaconf = OmegaConf.create(self["globals"])
globals_oc = OmegaConf.create(self._globals)
interpolated_value = OmegaConf.select(
global_omegaconf, variable, default=default_value
globals_oc, variable, default=default_value
)
if interpolated_value != _NO_VALUE:
return interpolated_value
Expand All @@ -339,6 +375,19 @@ def _get_globals_value(self, variable, default_value=_NO_VALUE):
f"Globals key '{variable}' not found and no default value provided."
)

def _get_runtime_value(self, variable, default_value=_NO_VALUE):
"""Return the runtime params values to the resolver"""
runtime_oc = OmegaConf.create(self.runtime_params)
interpolated_value = OmegaConf.select(
runtime_oc, variable, default=default_value
)
if interpolated_value != _NO_VALUE:
return interpolated_value
else:
raise InterpolationResolutionError(
f"Runtime parameter '{variable}' not found and no default value provided."
)

@staticmethod
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
def _register_new_resolvers(resolvers: dict[str, Callable]):
"""Register custom resolvers"""
Expand Down
Loading
Loading