Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add globals feature for OmegaConfigLoader using a globals resolver #2921

Merged
merged 29 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3e2e73e
Refactor load_and_merge_dir()
ankatiyar Aug 9, 2023
9223f85
Try adding globals resolver
ankatiyar Aug 9, 2023
44bc9e7
Minor change
ankatiyar Aug 9, 2023
f4ffa30
Add globals resolver
ankatiyar Aug 11, 2023
5648999
Merge branch 'main' into feat/globals
ankatiyar Aug 11, 2023
ced46c5
Revert refactoring
ankatiyar Aug 14, 2023
ee285f4
Add test + remove self.globals
ankatiyar Aug 15, 2023
7221a16
Allow for nested variables in globals
ankatiyar Aug 15, 2023
6ad693f
Add documentation
ankatiyar Aug 15, 2023
e49f72f
Merge branch 'main' into feat/globals
ankatiyar Aug 15, 2023
4fd5da0
Typo
ankatiyar Aug 15, 2023
84bf3d1
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 15, 2023
bd84d0a
Add error message + test
ankatiyar Aug 16, 2023
b004b87
Apply suggestions from code review
ankatiyar Aug 17, 2023
c099422
Split test into multiple tests
ankatiyar Aug 17, 2023
6cef54b
Restrict the globals config_patterns
ankatiyar Aug 17, 2023
0d5d95d
Release notes
ankatiyar Aug 17, 2023
d159cdf
Update docs/source/configuration/advanced_configuration.md
ankatiyar Aug 17, 2023
78793ef
Add helpful error message for keys starting with _
ankatiyar Aug 17, 2023
17789a7
Enable setting default value for globals resolver
ankatiyar Aug 18, 2023
b8b066d
Merge branch 'main' into feat/globals
ankatiyar Aug 18, 2023
d76c022
Typo
ankatiyar Aug 18, 2023
6e9c8b0
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 18, 2023
bed4106
Merge branch 'main' into feat/globals
astrojuanlu Aug 18, 2023
4b1b6f4
Merge branch 'main' into feat/globals
noklam Aug 21, 2023
01af470
Move test for keys starting with _ to the top
ankatiyar Aug 21, 2023
92ab551
Merge branch 'main' into feat/globals
ankatiyar Aug 21, 2023
ed91395
Fix cross ref link in docs
ankatiyar Aug 21, 2023
ca622b7
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
* Allowed registering of custom resolvers to `OmegaConfigLoader` through `CONFIG_LOADER_ARGS`.
* Added support for Python 3.11. This includes tackling challenges like dependency pinning and test adjustments to ensure a smooth experience. Detailed migration tips are provided below for further context.
* Added `kedro catalog resolve` CLI command that resolves dataset factories in the catalog with any explicit entries in the project pipeline.
* Added support for global variables to `OmegaConfigLoader`.


## Bug fixes and other changes
* Updated `kedro pipeline create` and `kedro catalog create` to use new `/conf` file structure.
Expand Down
35 changes: 34 additions & 1 deletion docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ folders:
fea: "04_feature"
```

To point your `TemplatedConfigLoader` to the globals file, add it to the the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):
To point your `TemplatedConfigLoader` to the globals file, add it to the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):

```python
CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}
Expand Down Expand Up @@ -124,6 +124,7 @@ This section contains a set of guidance for advanced configuration requirements
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to use Jinja2 syntax in configuration](#how-to-use-jinja2-syntax-in-configuration)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
* [How to load credentials through environment variables](#how-to-load-credentials-through-environment-variables)

Expand Down Expand Up @@ -262,6 +263,38 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the
#### Other configuration files
It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files.

### How to use global variables with the `OmegaConfigLoader`
From Kedro `0.18.13`, you can use variable interpolation in your configurations using "globals" with `OmegaConfigLoader`.
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters.
By default, these global variables are assumed to be in files called `globals.yml` in any of your environments. If you want to configure the naming patterns for the files that contain your global variables,
you can do so [by overwriting the `globals` key in `config_patterns`](#how-to-change-which-configuration-files-are-loaded). You can also [bypass the configuration loading](#how-to-bypass-the-configuration-loading-rules)
to directly set the global variables in `OmegaConfigLoader`.

Suppose you have global variables located in the file `conf/base/globals.yml`:
```yaml
my_global_value: 45
dataset_type:
csv: pandas.CSVDataSet
```
You can access these global variables in your catalog or parameters config files with a `globals` resolver like this:
`conf/base/parameters.yml`:
```yaml
my_param : "${globals:my_global_value}"
```
`conf/base/catalog.yml`:
```yaml
companies:
filepath: data/01_raw/companies.csv
type: "${globals:dataset_type.csv}"
```
You can also provide a default value to be used in case the global variable does not exist:
```yaml
my_param: "${globals: nonexistent_global, 23}"
```
If there are duplicate keys in the globals files in your base and run time environments, the values in the run time environment
will overwrite the values in your base environment.


### How to use resolvers in the `OmegaConfigLoader`
Instead of hard-coding values in your configuration files, you can also dynamically compute them using [`OmegaConf`'s
resolvers functionality](https://omegaconf.readthedocs.io/en/2.3_branch/custom_resolvers.html#resolvers). You use resolvers to define custom
Expand Down
5 changes: 3 additions & 2 deletions docs/source/configuration/configuration_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,19 @@ Configuration files will be matched according to file name and type rules. Suppo
### Configuration patterns
Under the hood, the Kedro configuration loader loads files based on regex patterns that specify the naming convention for configuration files. These patterns are specified by `config_patterns` in the configuration loader classes.

By default those patterns are set as follows for the configuration of catalog, parameters, logging and credentials:
By default those patterns are set as follows for the configuration of catalog, parameters, logging, credentials, and globals:

```python
config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals*", "globals*/**", "**/globals*"],
}
```

If you want to change change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.
If you want to change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.

## How to use Kedro configuration

Expand Down
1 change: 1 addition & 0 deletions docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]
* [How do I bypass the configuration loading rules](../configuration/advanced_configuration.md#how-to-bypass-the-configuration-loading-rules)?
* [How do I use Jinja2 syntax in configuration](../configuration/advanced_configuration.md#how-to-use-jinja2-syntax-in-configuration)?
* [How do I do templating with the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-do-templating-with-the-omegaconfigloader)?
* [How to use global variables with the `OmegaConfigLoader`](../configuration/advanced_configuration.m#how-to-use-global-variables-with-the-omegaconfigloader)?
* [How do I use resolvers in the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-use-resolvers-in-the-omegaconfigloader)?
* [How do I load credentials through environment variables](../configuration/advanced_configuration.md#how-to-load-credentials-through-environment-variables)?

Expand Down
37 changes: 35 additions & 2 deletions kedro/config/omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import fsspec
from omegaconf import OmegaConf
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError
from yaml.scanner import ScannerError
Expand Down Expand Up @@ -109,6 +110,7 @@ def __init__( # noqa: too-many-arguments
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals.yml"],
}
self.config_patterns.update(config_patterns or {})

Expand All @@ -117,7 +119,8 @@ def __init__( # noqa: too-many-arguments
# Register user provided custom resolvers
if custom_resolvers:
self._register_new_resolvers(custom_resolvers)

# Register globals resolver
self._register_globals_resolver()
file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
Expand Down Expand Up @@ -199,7 +202,7 @@ def __getitem__(self, key) -> dict[str, Any]:

config.update(env_config)

if not processed_files:
if not processed_files and key != "globals":
raise MissingConfigException(
f"No files of YAML or JSON format found in {base_path} or {env_path} matching"
f" the glob pattern(s): {[*self.config_patterns[key]]}"
Expand Down Expand Up @@ -308,6 +311,36 @@ def _is_valid_config_path(self, path):
".json",
]

def _register_globals_resolver(self):
"""Register the globals resolver"""
OmegaConf.register_new_resolver(
"globals",
lambda variable, default_value=None: self._get_globals_value(
variable, default_value
),
replace=True,
)

def _get_globals_value(self, variable, default_value):
"""Return the globals values to the resolver"""
if variable.startswith("_"):
raise InterpolationResolutionError(
"Keys starting with '_' are not supported for globals."
)
keys = variable.split(".")
value = self["globals"]
for k in keys:
value = value.get(k)
if not value:
if default_value:
_config_logger.debug(
f"Using the default value for the global variable {variable}."
)
return default_value
msg = f"Globals key '{variable}' not found and no default value provided. "
raise InterpolationResolutionError(msg)
return value

@staticmethod
def _register_new_resolvers(resolvers: dict[str, Callable]):
"""Register custom resolvers"""
Expand Down
126 changes: 126 additions & 0 deletions tests/config/test_omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import pytest
import yaml
from omegaconf import OmegaConf, errors
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError

Expand Down Expand Up @@ -671,3 +672,128 @@ def test_custom_resolvers(self, tmp_path):
assert conf["parameters"]["model_options"]["param1"] == 7
assert conf["parameters"]["model_options"]["param2"] == 3
assert conf["parameters"]["model_options"]["param3"] == "my_env_variable"

def test_globals(self, tmp_path):
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
globals_params = tmp_path / _BASE_ENV / "globals.yml"
globals_config = {
"x": 34,
}
_write_yaml(globals_params, globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
# OmegaConfigLoader has globals resolver
assert OmegaConf.has_resolver("globals")
# Globals is readable in a dict way
assert conf["globals"] == globals_config

def test_globals_resolution(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_catalog = tmp_path / _BASE_ENV / "catalog.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
param_config = {
"my_param": "${globals:x}",
"my_param_default": "${globals:y,34}", # y does not exist in globals
}
catalog_config = {
"companies": {
"type": "${globals:dataset_type}",
"filepath": "data/01_raw/companies.csv",
},
}
globals_config = {"x": 34, "dataset_type": "pandas.CSVDataSet"}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config)
_write_yaml(base_catalog, catalog_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
assert OmegaConf.has_resolver("globals")
# Globals are resolved correctly in parameter
assert conf["parameters"]["my_param"] == globals_config["x"]
# The default value is used if the key does not exist
assert conf["parameters"]["my_param_default"] == 34
# Globals are resolved correctly in catalog
assert conf["catalog"]["companies"]["type"] == globals_config["dataset_type"]

def test_globals_nested(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
param_config = {
"my_param": "${globals:x}",
"my_nested_param": "${globals:nested.y}",
}
globals_config = {
"x": 34,
"nested": {
"y": 42,
},
}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
assert conf["parameters"]["my_param"] == globals_config["x"]
# Nested globals are accessible with dot notation
assert conf["parameters"]["my_nested_param"] == globals_config["nested"]["y"]

def test_globals_across_env(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
local_params = tmp_path / _DEFAULT_RUN_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
local_globals = tmp_path / _DEFAULT_RUN_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:y}",
}
local_param_config = {
"param2": "${globals:x}",
}
base_globals_config = {
"x": 34,
"y": 25,
}
local_globals_config = {
"y": 99,
marrrcin marked this conversation as resolved.
Show resolved Hide resolved
}
_write_yaml(base_params, base_param_config)
_write_yaml(local_params, local_param_config)
_write_yaml(base_globals, base_globals_config)
_write_yaml(local_globals, local_globals_config)
conf = OmegaConfigLoader(tmp_path)
# Local global overwrites the base global value
assert conf["parameters"]["param1"] == local_globals_config["y"]
# Base global value is accessible to local params
assert conf["parameters"]["param2"] == base_globals_config["x"]

def test_bad_globals(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:x.y}",
}
base_globals_config = {
"x": {
"z": 23,
},
}
_write_yaml(base_params, base_param_config)
_write_yaml(base_globals, base_globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
with pytest.raises(
InterpolationResolutionError,
match=r"Globals key 'x.y' not found and no default value provided.",
):
conf["parameters"]["param1"]

def test_bad_globals_underscore(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
base_param_config = {
"param2": "${globals:_ignore}",
}
base_globals_config = {
"_ignore": 45,
}
_write_yaml(base_params, base_param_config)
_write_yaml(base_globals, base_globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
with pytest.raises(
InterpolationResolutionError,
match=r"Keys starting with '_' are not supported for globals.",
):
conf["parameters"]["param2"]
Loading