Skip to content

Conversation

@israbbani
Copy link
Contributor

In #57269, I introduced a bug:

  • the cpu controller is not enabled on the system and user cgroups because it only needs to be enabled on the parent.
  • however, CgroupDriver::AddConstraint checked that for every constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not running. For the time being, I've removed the manual tag and excluded cgroup from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests but rather include the right targets.

cpu.weight. It needs to be enabled in the parent cgroup. Python
integration tests were not running in CI.

Signed-off-by: irabbani <israbbani@gmail.com>
@israbbani israbbani added core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests labels Oct 15, 2025
Signed-off-by: irabbani <israbbani@gmail.com>
@israbbani
Copy link
Contributor Author

All tests are now running properly


[2025-10-15T17:08:04Z] (10:08:04) INFO: Build completed successfully, 10 total actions
--
  | [2025-10-15T17:08:04Z] //python/ray/tests/resource_isolation:test_resource_isolation_config     PASSED in 1.6s
  | [2025-10-15T17:08:04Z]   WARNING: //python/ray/tests/resource_isolation:test_resource_isolation_config: Test execution time (1.6s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
  | [2025-10-15T17:08:04Z] //python/ray/tests/resource_isolation:test_resource_isolation_integration PASSED in 20.5s
  | [2025-10-15T17:08:04Z]   WARNING: //python/ray/tests/resource_isolation:test_resource_isolation_integration: Test execution time (20.5s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".

...


[2025-10-15T17:11:33Z] (10:11:33) INFO: Build completed successfully, 1948 total actions
--
  | [2025-10-15T17:11:33Z] //src/ray/common/cgroup2/tests:cgroup_manager_test                       PASSED in 0.0s
  | [2025-10-15T17:11:33Z]   WARNING: //src/ray/common/cgroup2/tests:cgroup_manager_test: Test execution time (0.0s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
  | [2025-10-15T17:11:33Z] //src/ray/common/cgroup2/tests:sysfs_cgroup_driver_test                  PASSED in 0.0s
  | [2025-10-15T17:11:33Z]   WARNING: //src/ray/common/cgroup2/tests:sysfs_cgroup_driver_test: Test execution time (0.0s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".

...


[2025-10-15T17:16:56Z] [----------] 44 tests from SysFsCgroupDriverIntegrationTest (52 ms total)
--
  | [2025-10-15T17:16:56Z]
  | [2025-10-15T17:16:56Z] [----------] Global test environment tear-down
  | [2025-10-15T17:16:56Z] [==========] 44 tests from 1 test suite ran. (52 ms total)
  | [2025-10-15T17:16:56Z] [  PASSED  ] 44 tests.

@israbbani israbbani marked this pull request as ready for review October 15, 2025 20:46
@israbbani israbbani requested a review from a team as a code owner October 15, 2025 20:46
Comment on lines -339 to -351
// Check if the required controller for the constraint is enabled.
StatusOr<std::unordered_set<std::string>> available_controllers_s =
GetEnabledControllers(cgroup_path);
RAY_RETURN_NOT_OK(available_controllers_s.status());
const auto &controllers = available_controllers_s.value();
if (controllers.find(controller) == controllers.end()) {
return Status::InvalidArgument(absl::StrFormat(
"Failed to apply %s to cgroup %s. To use %s, enable the %s controller.",
constraint,
cgroup_path,
constraint,
controller));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this, I assumed that the controller needs to be enabled on the cgroup to apply a constraint. This isn't always the case (e.g. the cpu controller needs to be enabled on a parent cgroup) and the constraint is applied to the child cgroup.

I could keep the check and complicate the API (e.g. pass in the path of the cgroup on which to check that the controller is enabled), but I think it's cleaner for the caller to check first and then enable the constraint. This is what CgroupManager does.

@edoakes edoakes merged commit f0b78a2 into master Oct 15, 2025
6 checks passed
@edoakes edoakes deleted the irabbani/cgroups-17 branch October 15, 2025 21:04
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…int (ray-project#57731)

In ray-project#57269, I introduced a bug:
* the `cpu` controller is not enabled on the system and user cgroups
because it only needs to be enabled on the parent.
* however, CgroupDriver::AddConstraint checked that for every
constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not
running. For the time being, I've removed the manual tag and excluded
`cgroup` from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests
but rather include the right targets.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
…int (ray-project#57731)

In ray-project#57269, I introduced a bug:
* the `cpu` controller is not enabled on the system and user cgroups
because it only needs to be enabled on the parent.
* however, CgroupDriver::AddConstraint checked that for every
constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not
running. For the time being, I've removed the manual tag and excluded
`cgroup` from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests
but rather include the right targets.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
…int (#57731)

In #57269, I introduced a bug:
* the `cpu` controller is not enabled on the system and user cgroups
because it only needs to be enabled on the parent.
* however, CgroupDriver::AddConstraint checked that for every
constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not
running. For the time being, I've removed the manual tag and excluded
`cgroup` from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests
but rather include the right targets.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…int (ray-project#57731)

In ray-project#57269, I introduced a bug:
* the `cpu` controller is not enabled on the system and user cgroups
because it only needs to be enabled on the parent.
* however, CgroupDriver::AddConstraint checked that for every
constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not
running. For the time being, I've removed the manual tag and excluded
`cgroup` from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests
but rather include the right targets.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…int (ray-project#57731)

In ray-project#57269, I introduced a bug:
* the `cpu` controller is not enabled on the system and user cgroups
because it only needs to be enabled on the parent.
* however, CgroupDriver::AddConstraint checked that for every
constraint, the matching controller was enabled.

This was not caught in CI because the python integration tests were not
running. For the time being, I've removed the manual tag and excluded
`cgroup` from all other python tests.

We need to fix this properly because tests shouldn't exclude other tests
but rather include the right targets.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants