-
Notifications
You must be signed in to change notification settings - Fork 7k
Open
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corek8s-projK8s and Ray OSSK8s and Ray OSS
Description
Implement support for isolating ray system processes from application processes using cgroupv2.
Work Items
- Create an API for passing cgroup configuration into ray ([core] Adding user facing API for resource isolation #51865).
- Implement CI support for running cgroup tests ([ci] Enable Cgroup support in CI for core #51454).
- Implement a sysfs driver for cgroup operations with tests. ([core] (cgroups 1/n) Adding a sys/fs filesystem driver to perform cgroup operations. #54898).
- Implement integration tests for the sysfs driver. ([core] (cgroups 2/n) adding integration tests for the cgroup sysfs driver. #55063).
- Implement a cgroup manager that uses the cgroup driver to check invariants, create subcgroups, move processes, enable controllers and resource limits.
- Implement cgroup cleanup in cgroup manager.
- Implement process migration for system processes and worker processes into the correct cgroup before or on startup.
- [core] (cgroups 7/n) cleaning up old cgroup integration code for raylet and core worker #56285
- [core] (cgroups 8/n) Wiring CgroupManager into the raylet. #56297
- [core] (cgroups 9/n) end-to-end integration of cgroups with ray start. #56352
- [core] (cgroups 10/n) Adding support in CgroupManager and CgroupDriver to move processes into system cgroup #56446
- [core] (cgroups 11/n) Raylet will move system processes into cgroup on startup #56522
- [core] (cgroups 12/n) Raylet will start worker processes in the application cgroup #56549
- Cleanup old cgroup code and associated TODOs.
- Add a ProcessIsolationFactory as described here. Clean up all public bazel targets.
- Add a
usercgroup for all non-ray processes. - Tune defaults for
--system-reserved-cpuand--system-reserved-memory. - Moving the driver and dashboard subprocesses into the system cgroup
- Bug fixes
- Add user-facing documentation for enabling resource isolation on VMs and containers.
- Update Log messages to cross-link to user-facing documentation.
- Attempt to move all cgroup related functionality into its own namespace to see how it plays with developer ergonomics.
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corek8s-projK8s and Ray OSSK8s and Ray OSS
Type
Projects
Status
In Progress