Skip to content

Conversation

elliot-barn
Copy link
Contributor

Adding workspace funcs to parse all configs
not currently used in raydepsets

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
@elliot-barn elliot-barn requested a review from aslonnie October 9, 2025 00:00
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the capability to parse multiple dependency configuration files, which is a great step towards better modularity. The core logic changes are in ci/raydepsets/workspace.py, with corresponding updates to tests.

I've identified a few areas for improvement:

  • Potential Bug: The merge_configs function doesn't handle duplicate depset names across files, which could lead to non-deterministic behavior.
  • Code Clarity: The path manipulation logic in workspace.py can be simplified to be more explicit and easier to understand.
  • Test Quality: Some tests can be improved for efficiency, robustness, and to reduce duplication.

Overall, the changes are good, but addressing the points above will significantly improve the code's robustness and maintainability. Details are in the line-by-line comments.

Comment on lines +124 to +127
def merge_configs(self, configs: List[Config]) -> Config:
return Config(
depsets=[depset for config in configs for depset in config.depsets]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of merge_configs simply concatenates lists of depsets. If multiple configuration files define a depset with the same name, this will result in duplicates in the final list. This can lead to unpredictable behavior, as functions like _get_depset will only return the first match they find, which depends on the non-guaranteed order of files from os.listdir.

To prevent this, you should check for and disallow duplicate depset names across all configuration files. This will make the behavior deterministic and prevent subtle bugs.

    def merge_configs(self, configs: List[Config]) -> Config:
        all_depsets = [depset for config in configs for depset in config.depsets]
        depset_names = [depset.name for depset in all_depsets]
        if len(depset_names) != len(set(depset_names)):
            from collections import Counter

            duplicates = [
                name for name, count in Counter(depset_names).items() if count > 1
            ]
            raise ValueError(f"Duplicate depset names found: {duplicates}")
        return Config(depsets=all_depsets)

Comment on lines +22 to +28
assert "general_depset__py311_cpu" in [depset.name for depset in config.depsets]
assert "build_args_test_depset__py311_cpu" in [
depset.name for depset in config.depsets
]
assert "expanded_depset__py311_cpu" in [
depset.name for depset in config.depsets
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test implementation can be made more efficient and readable.

  1. The list comprehension [depset.name for depset in config.depsets] is created three times. It's better to compute it once.
  2. Using a set for membership testing (in) is more efficient than a list.

Also, the test name test_parse_build_arg_sets is now misleading since the test's focus has changed from testing build_arg_sets parsing to verifying the outcome of load_config. Consider renaming it to something like test_load_config_with_build_args for clarity.

Suggested change
assert "general_depset__py311_cpu" in [depset.name for depset in config.depsets]
assert "build_args_test_depset__py311_cpu" in [
depset.name for depset in config.depsets
]
assert "expanded_depset__py311_cpu" in [
depset.name for depset in config.depsets
]
depset_names = {depset.name for depset in config.depsets}
assert "general_depset__py311_cpu" in depset_names
assert "build_args_test_depset__py311_cpu" in depset_names
assert "expanded_depset__py311_cpu" in depset_names



# load all configs should always load all depsets
def test_load_all_configs_first_config():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tests test_load_all_configs_first_config and test_load_all_configs_second_config are nearly identical. To reduce code duplication and improve maintainability, you could use pytest.mark.parametrize to run the same test logic with different config files.

Example:

import pytest

@pytest.mark.parametrize("config_file", ["test.depsets.yaml", "test2.depsets.yaml"])
def test_load_all_configs(config_file):
    with tempfile.TemporaryDirectory() as tmpdir:
        copy_data_to_tmpdir(tmpdir)
        workspace = Workspace(dir=tmpdir)
        config = workspace.load_configs(config_path=Path(tmpdir) / config_file)
        assert config.depsets is not None
        assert len(config.depsets) == 11

Comment on lines +136 to +137
assert f"{tmpdir}/test.depsets.yaml" in configs_dir
assert f"{tmpdir}/test2.depsets.yaml" in configs_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of issues with these assertions:

  1. Hardcoded path separator: Using an f-string with / as a path separator is not portable across operating systems (e.g., Windows uses \). It's better to use os.path.join to construct paths.
  2. Unordered results: os.listdir (used inside get_configs_dir) does not guarantee the order of files. Asserting membership in a list can be flaky if the order changes. It's more robust to convert the result to a set and compare it with an expected set of paths.
Suggested change
assert f"{tmpdir}/test.depsets.yaml" in configs_dir
assert f"{tmpdir}/test2.depsets.yaml" in configs_dir
assert set(configs_dir) == {
f"{tmpdir}/test.depsets.yaml",
f"{tmpdir}/test2.depsets.yaml",
}

Comment on lines +109 to +115
def get_configs_dir(self, configs_path: str) -> List[str]:
configs_dir = os.path.dirname(os.path.join(self.dir, configs_path))
return [
os.path.join(self.dir, configs_dir, path)
for path in os.listdir(os.path.join(self.dir, configs_dir))
if path.endswith(".depsets.yaml")
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of get_configs_dir is a bit confusing because it repeatedly uses os.path.join(self.dir, ...) where the second argument is often an absolute path. While this works due to os.path.join's behavior, it makes the code harder to reason about.

A clearer approach would be to resolve the directory path once and then use it to find and construct the full paths of the config files. This would also simplify the load_config method, which currently has a redundant os.path.join.

    def get_configs_dir(self, configs_path: str) -> List[str]:
        if not os.path.isabs(configs_path):
            configs_path = os.path.join(self.dir, configs_path)

        configs_dir = os.path.dirname(configs_path)

        return [
            os.path.join(configs_dir, path)
            for path in os.listdir(configs_dir)
            if path.endswith(".depsets.yaml")
        ]

]

def load_config(self, config_path: str) -> Config:
with open(os.path.join(self.dir, config_path), "r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since get_configs_dir returns absolute paths (especially with the suggested refactoring), the os.path.join(self.dir, config_path) is redundant here. os.path.join will simply return config_path because it's an absolute path. You can simplify this by opening config_path directly.

        with open(config_path, "r") as f:

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core devprod labels Oct 9, 2025
@aslonnie aslonnie added the go add ONLY when ready to merge, run all tests label Oct 10, 2025
@aslonnie aslonnie merged commit 97455f3 into master Oct 10, 2025
6 checks passed
@aslonnie aslonnie deleted the elliot-barn/parsing-all-configs branch October 10, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core devprod go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants