-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityk8s-projK8s and Ray OSSK8s and Ray OSS
Description
Description
Implement support for label selector API:
- (P0) Ray saves label info associated with a node in
GcsNodeInfo- already implemented - (P0) Update
--labelsargument to take either a list of strings or read from file and expose this API publicly - (P0) Add
label_selectorAPI to@ray.remotedecorator to schedule tasks/actors - (P0) Update
ClusterResourceScheduler::GetBestSchedulableNodeto enforcelabel_selectorconditions when returning list of candidate nodes. This will eventually replaceSchedulingOptions::NodeLabelScheduling(scheduling_strategy). - (P1) Add node labels to runtime context for tasks/actors
- (P1) Add
bundle_label_selectorto theray.util.placement_groupconstructor to apply a set oflabel_selectors to placement group bundles - (P0) Populate list of default labels automatically, currently only supports
ray.io/node-id, from K8s [Core] Add default Ray Node labels at Node init #53360
Autoscaler adaptation:
- (P1) Update Autoscaler data model to pass label information by adding a labels field to the ResourceRequest message
- (P1) Adapt Ray V2 Autoscaler to parse labels from K8s Pod Spec and generate a
--labelsarg torayStartParamsand potential code cleanup.- Done by generating default labels from K8s Pod spec in KubeRay: Add default Ray node label info to Ray Pod environment kuberay#3699
- Add top-level Labels and Resources Structed fields to
HeadGroupSpecandWorkerGroupSpeckuberay#4106
- (P1) Update Autoscaler bin packing logic to directly consider label matching
- (P1) Update the Autoscaler code path to handle the label information passed back from GCS
- (P1) Add
labelsto autoscaling config by parsingrayStartParams - (P1) Add parsing
labelsin other types of cluster launchers - (P2) Cleanup default labels dead code
Documentation/Library changes
- (P1) End-to-end test for label based scheduling milestone 1
- (P1) End-to-end example in Kuberay doc for label selector with autoscaling
- (P1) Update documentation/examples to use updated
label_selectorAPI for non-fallback use cases - (P1) Replacing the library usage of
NodeLabelSchedulingStrategy(soft=false)with label based scheduling - (P1) Support passing labels from head and worker group specs in RayCluster CR in KubeRay to Ray nodes
- (P1) Add label selector dimension into the infeasible task cancellation mechanism
- (P2) Add labels argument to
request_resource()SDK function used by Ray libraries - (P2) Determine whitelist of K8s labels to always pass to Ray nodes
- (P2) Add
required_labelsto TaskState schema to expose labels in state API - (P2) Add node labels & task/actor label selectors to Ray Events
- (P2) Ray libraries leverages the new label selector API
Milestone 2:
- (P0) Update the
label_selectorAPI in@ray.remotedecorator to support label fallback syntax - (P0) Update the
bundle_label_selectorin theray.util.placement_groupconstructor to support label fallback syntax - (P0) Implement
fallback_strategyscheduling logic for tasks, actors and placement groups - (P0) Update the gcs logic to pass the fallback strategies to autoscaler
- (P0) Update Autoscaler bin packing logic to support label fallback syntax
- (P0) Add
exists()and!exists()label operators in label selectors - (P0) Add validation to prevent empty string value in node labels & label selectors in Ray code, Autoscaler and Kuberay
- (P1) Update documentation/examples to use updated
label_selectorAPI for label fallback use cases - (P2) Update library usage of
NodeLabelSchedulingStrategy, _soft_target_node_id and other related features withlabel_selectorAPI - (P2) Add deprecation warnings for the
NodeLabelSchedulingStrategy,accelerator typesand other features that will be replaced by label based scheduling
Use case
This issue will serve to track the progress of implementing the label selector API feature enhancement. This enhancement supersedes the previous node affinity feature enhancement REP and continues on much of the implementation there.
Node affinity feature enhancement work tracker: #34894
Related REP: ray-project/enhancements#60
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityk8s-projK8s and Ray OSSK8s and Ray OSS
Type
Projects
Status
In Progress