Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add globals feature for OmegaConfigLoader using a globals resolver #2921

Merged
merged 29 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3e2e73e
Refactor load_and_merge_dir()
ankatiyar Aug 9, 2023
9223f85
Try adding globals resolver
ankatiyar Aug 9, 2023
44bc9e7
Minor change
ankatiyar Aug 9, 2023
f4ffa30
Add globals resolver
ankatiyar Aug 11, 2023
5648999
Merge branch 'main' into feat/globals
ankatiyar Aug 11, 2023
ced46c5
Revert refactoring
ankatiyar Aug 14, 2023
ee285f4
Add test + remove self.globals
ankatiyar Aug 15, 2023
7221a16
Allow for nested variables in globals
ankatiyar Aug 15, 2023
6ad693f
Add documentation
ankatiyar Aug 15, 2023
e49f72f
Merge branch 'main' into feat/globals
ankatiyar Aug 15, 2023
4fd5da0
Typo
ankatiyar Aug 15, 2023
84bf3d1
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 15, 2023
bd84d0a
Add error message + test
ankatiyar Aug 16, 2023
b004b87
Apply suggestions from code review
ankatiyar Aug 17, 2023
c099422
Split test into multiple tests
ankatiyar Aug 17, 2023
6cef54b
Restrict the globals config_patterns
ankatiyar Aug 17, 2023
0d5d95d
Release notes
ankatiyar Aug 17, 2023
d159cdf
Update docs/source/configuration/advanced_configuration.md
ankatiyar Aug 17, 2023
78793ef
Add helpful error message for keys starting with _
ankatiyar Aug 17, 2023
17789a7
Enable setting default value for globals resolver
ankatiyar Aug 18, 2023
b8b066d
Merge branch 'main' into feat/globals
ankatiyar Aug 18, 2023
d76c022
Typo
ankatiyar Aug 18, 2023
6e9c8b0
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 18, 2023
bed4106
Merge branch 'main' into feat/globals
astrojuanlu Aug 18, 2023
4b1b6f4
Merge branch 'main' into feat/globals
noklam Aug 21, 2023
01af470
Move test for keys starting with _ to the top
ankatiyar Aug 21, 2023
92ab551
Merge branch 'main' into feat/globals
ankatiyar Aug 21, 2023
ed91395
Fix cross ref link in docs
ankatiyar Aug 21, 2023
ca622b7
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 80 additions & 49 deletions kedro/config/omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,14 @@ def __init__( # noqa: too-many-arguments
"""
self.base_env = base_env
self.default_run_env = default_run_env
self.globals = {}

self.config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals*", "globals*/**", "**/globals*"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        >>> # in settings.py
        >>> from kedro.config import TemplatedConfigLoader
        >>>
        >>> CONFIG_LOADER_CLASS = TemplatedConfigLoader
        >>> CONFIG_LOADER_ARGS = {
        >>>     "globals_pattern": "*globals.yml",
        >>> }
Suggested change
"globals": ["globals*", "globals*/**", "**/globals*"],
"globals": ["globals*"],

I suggest keep this pattern simple.

  1. Align with the current TemplatedConfigLoader pattern, the above code block is copy from our template
  2. Most people don't need nested globals, putting globals in folder should be minority, if so they should change the settings instead.
  3. **/globals* is quite dangerous, as this could match parameters/globals_parameters.yml - although we usually suggest people using parameter_globals instead.

I may even suggest a more conservative default "globals": "globals.yml", forcing the default to be just globals.yml. If we change this after release it will be breaking change.

WDYT? @ankatiyar @merelcht , cc @stichbury because I really think we should separate the terminology for globals, it's being overloaded a lot now and this can lead to weird bugs.

Copy link
Contributor

@noklam noklam Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline conversation - agree with using default with globals.yml. Cc @merelcht @ankatiyar

}
self.config_patterns.update(config_patterns or {})

Expand All @@ -117,7 +119,7 @@ def __init__( # noqa: too-many-arguments
# Register user provided custom resolvers
if custom_resolvers:
self._register_new_resolvers(custom_resolvers)

self._register_globals_resolver()
file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
Expand All @@ -136,10 +138,20 @@ def __init__( # noqa: too-many-arguments
env=env,
runtime_params=runtime_params,
)
# Read globals from patterns if the files exist
try:
self.globals = self._get_config("globals")
merelcht marked this conversation as resolved.
Show resolved Hide resolved
except MissingConfigException:
pass

def __repr__(self): # pragma: no cover
return (
f"OmegaConfigLoader(conf_source={self.conf_source}, env={self.env}, "
f"config_patterns={self.config_patterns})"
)

def __getitem__(self, key) -> dict[str, Any]:
"""Get configuration files by key, load and merge them, and
return them in the form of a config dictionary.
"""Get configuration files by key and return them in the form of a config dictionary.

Args:
key: Key of the configuration type to fetch.
Expand All @@ -158,32 +170,29 @@ def __getitem__(self, key) -> dict[str, Any]:
# explicitly on the ``OmegaConfigLoader`` instance.
if key in self:
return super().__getitem__(key)

if key not in self.config_patterns:
raise KeyError(
f"No config patterns were found for '{key}' in your config loader"
)
patterns = [*self.config_patterns[key]]
return self._get_config(key)

def _get_config(self, key) -> dict[str, Any]:
"""Helper function to get configuration files by key, load and merge them,
and return them in the form of a config dictionary"""
patterns = [*self.config_patterns[key]]
read_environment_variables = key == "credentials"

processed_files: set[Path] = set()
# Load base env config
if self._protocol == "file":
base_path = str(Path(self.conf_source) / self.base_env)
else:
base_path = str(Path(self._fs.ls("", detail=False)[-1]) / self.base_env)
base_path = self._get_conf_path(self.base_env)
base_config = self.load_and_merge_dir_config(
base_path, patterns, key, processed_files, read_environment_variables
)
config = base_config

# Load chosen env config
run_env = self.env or self.default_run_env
if self._protocol == "file":
env_path = str(Path(self.conf_source) / run_env)
else:
env_path = str(Path(self._fs.ls("", detail=False)[-1]) / run_env)
env_path = self._get_conf_path(run_env)
env_config = self.load_and_merge_dir_config(
env_path, patterns, key, processed_files, read_environment_variables
)
Expand All @@ -199,18 +208,19 @@ def __getitem__(self, key) -> dict[str, Any]:

config.update(env_config)

if not processed_files:
if not processed_files and key != "globals":
raise MissingConfigException(
f"No files of YAML or JSON format found in {base_path} or {env_path} matching"
f" the glob pattern(s): {[*self.config_patterns[key]]}"
)
return config

def __repr__(self): # pragma: no cover
return (
f"OmegaConfigLoader(conf_source={self.conf_source}, env={self.env}, "
f"config_patterns={self.config_patterns})"
)
def _get_conf_path(self, env):
if self._protocol == "file":
conf_path = str(Path(self.conf_source) / env)
else:
conf_path = str(Path(self._fs.ls("", detail=False)[-1]) / env)
return conf_path

def load_and_merge_dir_config( # noqa: too-many-arguments
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
self,
Expand Down Expand Up @@ -246,36 +256,12 @@ def load_and_merge_dir_config( # noqa: too-many-arguments
f"Given configuration path either does not exist "
f"or is not a valid directory: {conf_path}"
)

paths = [
Path(each)
for pattern in patterns
for each in self._fs.glob(Path(f"{str(conf_path)}/{pattern}").as_posix())
]
deduplicated_paths = set(paths)
config_files_filtered = [
path for path in deduplicated_paths if self._is_valid_config_path(path)
]

config_per_file = {}
for config_filepath in config_files_filtered:
try:
with self._fs.open(str(config_filepath.as_posix())) as open_config:
# As fsspec doesn't allow the file to be read as StringIO,
# this is a workaround to read it as a binary file and decode it back to utf8.
tmp_fo = io.StringIO(open_config.read().decode("utf8"))
config = OmegaConf.load(tmp_fo)
processed_files.add(config_filepath)
if read_environment_variables:
self._resolve_environment_variables(config)
config_per_file[config_filepath] = config
except (ParserError, ScannerError) as exc:
line = exc.problem_mark.line # type: ignore
cursor = exc.problem_mark.column # type: ignore
raise ParserError(
f"Invalid YAML or JSON file {Path(conf_path, config_filepath.name).as_posix()},"
f" unable to read line {line}, position {cursor}."
) from exc
config_files_filtered = self._get_filtered_conf_paths(conf_path, patterns)
config_per_file = self._load_config_files(
conf_path, config_files_filtered, read_environment_variables
)
for file in config_per_file.keys():
processed_files.add(file)

seen_file_to_keys = {
file: set(config.keys()) for file, config in config_per_file.items()
Expand All @@ -299,6 +285,18 @@ def load_and_merge_dir_config( # noqa: too-many-arguments
if not k.startswith("_")
}

def _get_filtered_conf_paths(self, conf_path, patterns):
paths = [
Path(each)
for pattern in patterns
for each in self._fs.glob(Path(f"{str(conf_path)}/{pattern}").as_posix())
]
deduplicated_paths = set(paths)
config_files_filtered = [
path for path in deduplicated_paths if self._is_valid_config_path(path)
]
return config_files_filtered

def _is_valid_config_path(self, path):
"""Check if given path is a file path and file type is yaml or json."""
posix_path = path.as_posix()
Expand All @@ -308,6 +306,39 @@ def _is_valid_config_path(self, path):
".json",
]

def _load_config_files(
self, conf_path, config_filepaths, read_environment_variables
):
config_per_file = {}
for config_filepath in config_filepaths:
try:
with self._fs.open(str(config_filepath.as_posix())) as open_config:
# As fsspec doesn't allow the file to be read as StringIO,
# this is a workaround to read it as a binary file and decode it back to utf8.
tmp_fo = io.StringIO(open_config.read().decode("utf8"))
config = OmegaConf.load(tmp_fo)
if read_environment_variables:
self._resolve_environment_variables(config)
config_per_file[config_filepath] = config
except (ParserError, ScannerError) as exc:
line = exc.problem_mark.line # type: ignore
cursor = exc.problem_mark.column # type: ignore
raise ParserError(
f"Invalid YAML or JSON file {Path(conf_path, config_filepath.name).as_posix()},"
f" unable to read line {line}, position {cursor}."
) from exc
return config_per_file

def _register_globals_resolver(self):
"""Register the globals resolver"""
OmegaConf.register_new_resolver(
"globals", lambda x: self._get_globals_value(x), replace=True
)

def _get_globals_value(self, variable):
"""Return the globals values to the resolver"""
return self.globals[variable]

@staticmethod
def _register_new_resolvers(resolvers: dict[str, Callable]):
"""Register custom resolvers"""
Expand Down
16 changes: 16 additions & 0 deletions tests/config/test_omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,3 +671,19 @@ def test_custom_resolvers(self, tmp_path):
assert conf["parameters"]["model_options"]["param1"] == 7
assert conf["parameters"]["model_options"]["param2"] == 3
assert conf["parameters"]["model_options"]["param3"] == "my_env_variable"

def test_globals(self, tmp_path):
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
base_params = tmp_path / _BASE_ENV / "parameters.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
param_config = {
"param1": "${globals:hello}",
}
globals_config = {
"hello": 34,
}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
parameters = conf["parameters"]
assert OmegaConf.has_resolver("globals")
assert parameters["param1"] == 34