-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[deps] raydepsets import all config (2/3) #57584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the capability to parse multiple dependency configuration files, which is a great step towards better modularity. The core logic changes are in ci/raydepsets/workspace.py
, with corresponding updates to tests.
I've identified a few areas for improvement:
- Potential Bug: The
merge_configs
function doesn't handle duplicate depset names across files, which could lead to non-deterministic behavior. - Code Clarity: The path manipulation logic in
workspace.py
can be simplified to be more explicit and easier to understand. - Test Quality: Some tests can be improved for efficiency, robustness, and to reduce duplication.
Overall, the changes are good, but addressing the points above will significantly improve the code's robustness and maintainability. Details are in the line-by-line comments.
def merge_configs(self, configs: List[Config]) -> Config: | ||
return Config( | ||
depsets=[depset for config in configs for depset in config.depsets] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of merge_configs
simply concatenates lists of depsets. If multiple configuration files define a depset with the same name, this will result in duplicates in the final list. This can lead to unpredictable behavior, as functions like _get_depset
will only return the first match they find, which depends on the non-guaranteed order of files from os.listdir
.
To prevent this, you should check for and disallow duplicate depset names across all configuration files. This will make the behavior deterministic and prevent subtle bugs.
def merge_configs(self, configs: List[Config]) -> Config:
all_depsets = [depset for config in configs for depset in config.depsets]
depset_names = [depset.name for depset in all_depsets]
if len(depset_names) != len(set(depset_names)):
from collections import Counter
duplicates = [
name for name, count in Counter(depset_names).items() if count > 1
]
raise ValueError(f"Duplicate depset names found: {duplicates}")
return Config(depsets=all_depsets)
assert "general_depset__py311_cpu" in [depset.name for depset in config.depsets] | ||
assert "build_args_test_depset__py311_cpu" in [ | ||
depset.name for depset in config.depsets | ||
] | ||
assert "expanded_depset__py311_cpu" in [ | ||
depset.name for depset in config.depsets | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test implementation can be made more efficient and readable.
- The list comprehension
[depset.name for depset in config.depsets]
is created three times. It's better to compute it once. - Using a
set
for membership testing (in
) is more efficient than alist
.
Also, the test name test_parse_build_arg_sets
is now misleading since the test's focus has changed from testing build_arg_sets
parsing to verifying the outcome of load_config
. Consider renaming it to something like test_load_config_with_build_args
for clarity.
assert "general_depset__py311_cpu" in [depset.name for depset in config.depsets] | |
assert "build_args_test_depset__py311_cpu" in [ | |
depset.name for depset in config.depsets | |
] | |
assert "expanded_depset__py311_cpu" in [ | |
depset.name for depset in config.depsets | |
] | |
depset_names = {depset.name for depset in config.depsets} | |
assert "general_depset__py311_cpu" in depset_names | |
assert "build_args_test_depset__py311_cpu" in depset_names | |
assert "expanded_depset__py311_cpu" in depset_names |
|
||
|
||
# load all configs should always load all depsets | ||
def test_load_all_configs_first_config(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests test_load_all_configs_first_config
and test_load_all_configs_second_config
are nearly identical. To reduce code duplication and improve maintainability, you could use pytest.mark.parametrize
to run the same test logic with different config files.
Example:
import pytest
@pytest.mark.parametrize("config_file", ["test.depsets.yaml", "test2.depsets.yaml"])
def test_load_all_configs(config_file):
with tempfile.TemporaryDirectory() as tmpdir:
copy_data_to_tmpdir(tmpdir)
workspace = Workspace(dir=tmpdir)
config = workspace.load_configs(config_path=Path(tmpdir) / config_file)
assert config.depsets is not None
assert len(config.depsets) == 11
assert f"{tmpdir}/test.depsets.yaml" in configs_dir | ||
assert f"{tmpdir}/test2.depsets.yaml" in configs_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues with these assertions:
- Hardcoded path separator: Using an f-string with
/
as a path separator is not portable across operating systems (e.g., Windows uses\
). It's better to useos.path.join
to construct paths. - Unordered results:
os.listdir
(used insideget_configs_dir
) does not guarantee the order of files. Asserting membership in a list can be flaky if the order changes. It's more robust to convert the result to aset
and compare it with an expected set of paths.
assert f"{tmpdir}/test.depsets.yaml" in configs_dir | |
assert f"{tmpdir}/test2.depsets.yaml" in configs_dir | |
assert set(configs_dir) == { | |
f"{tmpdir}/test.depsets.yaml", | |
f"{tmpdir}/test2.depsets.yaml", | |
} |
def get_configs_dir(self, configs_path: str) -> List[str]: | ||
configs_dir = os.path.dirname(os.path.join(self.dir, configs_path)) | ||
return [ | ||
os.path.join(self.dir, configs_dir, path) | ||
for path in os.listdir(os.path.join(self.dir, configs_dir)) | ||
if path.endswith(".depsets.yaml") | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of get_configs_dir
is a bit confusing because it repeatedly uses os.path.join(self.dir, ...)
where the second argument is often an absolute path. While this works due to os.path.join
's behavior, it makes the code harder to reason about.
A clearer approach would be to resolve the directory path once and then use it to find and construct the full paths of the config files. This would also simplify the load_config
method, which currently has a redundant os.path.join
.
def get_configs_dir(self, configs_path: str) -> List[str]:
if not os.path.isabs(configs_path):
configs_path = os.path.join(self.dir, configs_path)
configs_dir = os.path.dirname(configs_path)
return [
os.path.join(configs_dir, path)
for path in os.listdir(configs_dir)
if path.endswith(".depsets.yaml")
]
] | ||
|
||
def load_config(self, config_path: str) -> Config: | ||
with open(os.path.join(self.dir, config_path), "r") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since get_configs_dir
returns absolute paths (especially with the suggested refactoring), the os.path.join(self.dir, config_path)
is redundant here. os.path.join
will simply return config_path
because it's an absolute path. You can simplify this by opening config_path
directly.
with open(config_path, "r") as f:
Adding workspace funcs to parse all configs
not currently used in raydepsets