Skip to content

Commit fc4510f

Browse files
[DOC-127] Cherry-pick (#54254): MVP for OSS Ray labels (#57547)
(cherry picked from commit 44a9732) Co-authored-by: Douglas Strodtman <douglas@anyscale.com>
1 parent 276c75c commit fc4510f

File tree

3 files changed

+197
-11
lines changed

3 files changed

+197
-11
lines changed

doc/source/ray-core/scheduling/index.rst

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,24 @@
33
Scheduling
44
==========
55

6-
For each task or actor, Ray will choose a node to run it and the scheduling decision is based on the following factors.
6+
This page provides an overview of how Ray decides to schedule tasks and actors to nodes.
7+
8+
.. DJS 19 Sept 2025: There should be an overview of all features and configs that impact scheduling here.
9+
This should include descriptions for default values and behaviors, and links to things like default labels or resource definitions that can be used for scheduling without customization.
10+
11+
Labels
12+
------
13+
14+
Labels provide a simplified solution for controlling scheduling for tasks, actors, and placement group bundles using default and custom labels. See :doc:`./labels`.
15+
16+
Labels are a beta feature. As this feature becomes stable, the Ray team recommends using labels to replace the following patterns:
17+
18+
- NodeAffinitySchedulingStrategy when `soft=false`. Use the default `ray.io/node-id` label instead.
19+
- The `accelerator_type` option for tasks and actors. Use the default `ray.io/accelerator-type` label instead.
20+
21+
.. note::
22+
23+
A legacy pattern recommended using custom resources for label-based scheduling. We now recommend only using custom resources when you need to manage scheduling using numeric values.
724

825
.. _ray-scheduling-resources:
926

@@ -127,6 +144,7 @@ More about Ray Scheduling
127144
.. toctree::
128145
:maxdepth: 1
129146

147+
labels
130148
resources
131149
accelerators
132150
placement-group
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
---
2+
description: "Learn about using labels to control how Ray schedules tasks, actors, and placement groups to nodes in your Kubernetes cluster."
3+
---
4+
5+
(labels)=
6+
# Use labels to control scheduling
7+
8+
In Ray version 2.49.0 and above, you can use labels to control scheduling for KubeRay. Labels are a beta feature.
9+
10+
This page provides a conceptual overview and usage instructions for labels. Labels are key-value pairs that provide a human-readable configuration for users to control how Ray schedules tasks, actors, and placement group bundles to specific nodes.
11+
12+
13+
```{note}
14+
Ray labels share the same syntax and formatting restrictions as Kubernetes labels, but are conceptually distinct. See the [Kubernetes docs on labels and selectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
15+
```
16+
17+
18+
## How do labels work?
19+
20+
The following is a high-level overview of how you use labels to control scheduling:
21+
22+
- Ray sets default labels that describe the underlying compute. See [](defaults).
23+
- You define custom labels as key-value pairs. See [](custom).
24+
- You specify *label selectors* in your Ray code to define label requirements. You can specify these requirements at the task, actor, or placement group bundle level. See [](label-selectors).
25+
- Ray schedules tasks, actors, or placement group bundles based on the specified label selectors.
26+
- In Ray 2.50.0 and above, if you're using a dynamic cluster with autoscaler V2 enabled, the cluster scales up to add new nodes from a designated worker group to fulfill label requirements.
27+
28+
(defaults)=
29+
## Default node labels
30+
```{note}
31+
Ray reserves all labels under ray.io namespace.
32+
```
33+
During cluster initialization or as autoscaling events add nodes to your cluster, Ray assigns the following default labels to each node:
34+
35+
| Label | Description |
36+
| --- | --- |
37+
| `ray.io/node-id` | A unique ID generated for the node. |
38+
| `ray.io/accelerator-type` | The accelerator type of the node, for example `L4`. CPU-only machines have an empty string. See {ref}`accelerator types <accelerator-types>` for a mapping of values. |
39+
40+
```{note}
41+
You can override default values using `ray start` parameters.
42+
```
43+
44+
The following are examples of default labels:
45+
46+
```python
47+
"ray.io/accelerator-type": "" # Default label indicating the machine is CPU-only.
48+
```
49+
50+
(custom)=
51+
## Define custom labels
52+
53+
You can add custom labels to your nodes using the `--labels` or `--labels-file` parameter when running `ray start`.
54+
55+
```bash
56+
# Examples 1: Start a head node with cpu-family and test-label labels
57+
ray start --head --labels="cpu-family=amd,test-label=test-value"
58+
59+
# Example 2: Start a head node with labels from a label file
60+
ray start --head --labels-files='./test-labels-file'
61+
62+
# The file content can be the following (should be a valid YAML file):
63+
# "test-label": "test-value"
64+
# "test-label-2": "test-value-2"
65+
```
66+
67+
```{note}
68+
You can't set labels using `ray.init()`. Local Ray clusters don't support labels.
69+
```
70+
71+
(label-selectors)=
72+
## Specify label selectors
73+
74+
You add label selector logic to your Ray code when defining Ray tasks, actors, or placement group bundles. Label selectors define the label requirements for matching your Ray code to a node in your Ray cluster.
75+
76+
Label selectors specify the following:
77+
78+
- The key of the label.
79+
- Operator logic for matching.
80+
- The value or values to match on.
81+
82+
The following table shows the basic syntax for label selector operator logic:
83+
84+
| Operator | Description | Example syntax |
85+
| --- | --- | --- |
86+
| Equals | Label matches exactly one value. | `{“key”: “value”}`
87+
| Not equal | Label matches anything by one value. | `{“key”: “!value”}`
88+
| In | Label matches one of the provided values. | `{“key”: “in(val1,val2)”}`
89+
| Not in | Label matches none of the provided values. | `{“key”: “!in(val1,val2)”}`
90+
91+
You can specify one or more label selectors as a dict. When specifying multiple label selectors, the candidate node must meet all requirements. The following example configuration uses a custom label to require an `m5.16xlarge` EC2 instance and a default label to require node ID to be 123:
92+
93+
```python
94+
label_selector={"instance_type": "m5.16xlarge", "ray.io/node-id": "123"}
95+
```
96+
97+
## Specify label requirements for tasks and actors
98+
99+
Use the following syntax to add label selectors to tasks and actors:
100+
101+
```python
102+
# An example for specifing label_selector in task's @ray.remote annotation
103+
@ray.remote(label_selector={"label_name":"label_value"})
104+
def f():
105+
pass
106+
107+
# An example of specifying label_selector in actor's @ray.remote annotation
108+
@ray.remote(label_selector={"ray.io/accelerator-type": "nvidia-h100"})
109+
class Actor:
110+
pass
111+
112+
# An example of specifying label_selector in task's options
113+
@ray.remote
114+
def test_task_label_in_options():
115+
pass
116+
117+
test_task_label_in_options.options(label_selector={"test-lable-key": "test-label-value"}).remote()
118+
119+
# An example of specifying label_selector in actor's options
120+
@ray.remote
121+
class Actor:
122+
pass
123+
124+
actor_1 = Actor.options(
125+
label_selector={"ray.io/accelerator-type": "nvidia-h100"},
126+
).remote()
127+
```
128+
129+
## Specify label requirements for placement group bundles
130+
131+
Use the `bundle_label_selector` option to add label selector to placement group bundles. See the following examples:
132+
133+
```python
134+
# All bundles require the same labels:
135+
ray.util.placement_group(
136+
bundles=[{"GPU": 1}, {"GPU": 1}],
137+
bundle_label_selector=[{"ray.io/accelerator-type": "H100"} * 2],
138+
)
139+
140+
# Bundles require different labels:
141+
ray.util.placement_group(
142+
bundles=[{"CPU": 1}] + [{"GPU": 1} * 2],
143+
bundle_label_selector=[{"ray.io/market-type": "spot"}] + [{"ray.io/accelerator-type": "H100"} * 2]
144+
)
145+
```
146+
## Using labels with autoscaler
147+
148+
Autoscaler V2 supports label-based scheduling. To enable autoscaler to scale up nodes to fulfill label requirements, you need to create multiple worker groups for different label requirement combinations and specify all the corresponding labels in the `rayStartParams` field in the Ray cluster configuration. For example:
149+
150+
```python
151+
rayStartParams: {
152+
labels: "region=me-central1,ray.io/accelerator-type=nvidia-h100"
153+
}
154+
```
155+
156+
## Monitor nodes using labels
157+
158+
The Ray dashboard automatically shows the following information:
159+
- Labels for each node. See {py:attr}`ray.util.state.common.NodeState.labels`.
160+
- Label selectors set for each task, actor, or placement group bundle. See {py:attr}`ray.util.state.common.TaskState.label_selector` and {py:attr}`ray.util.state.common.ActorState.label_selector`.
161+
162+
Within a task, you can programmatically obtain the node label from the RuntimeContextAPI using `ray.get_runtime_context().get_node_labels()`. This returns a Python dict. See the following example:
163+
164+
```python
165+
@ray.remote
166+
def test_task_label():
167+
node_labels = ray.get_runtime_context().get_node_labels()
168+
print(f"[test_task_label] node labels: {node_labels}")
169+
170+
"""
171+
Example output:
172+
(test_task_label pid=68487) [test_task_label] node labels: {'test-label-1': 'test-value-1', 'test-label-key': 'test-label-value', 'test-label-2': 'test-value-2'}
173+
"""
174+
```
175+
You can also access information about node label and label selector information using the state API and state CLI.

doc/source/ray-core/scheduling/resources.rst

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,16 +62,9 @@ The fact that resources are logical has several implications:
6262
Custom Resources
6363
----------------
6464

65-
Besides pre-defined resources, you can also specify a Ray node's custom resources and request them in your tasks or actors.
66-
Some use cases for custom resources:
67-
68-
- Your node has special hardware and you can represent it as a custom resource.
69-
Then your tasks or actors can request the custom resource via ``@ray.remote(resources={"special_hardware": 1})``
70-
and Ray will schedule the tasks or actors to the node that has the custom resource.
71-
- You can use custom resources as labels to tag nodes and you can achieve label based affinity scheduling.
72-
For example, you can do ``ray.remote(resources={"custom_label": 0.001})`` to schedule tasks or actors to nodes with ``custom_label`` custom resource.
73-
For this use case, the actual quantity doesn't matter, and the convention is to specify a tiny number so that the label resource is
74-
not the limiting factor for parallelism.
65+
You can specify custom resources for a Ray node and reference them to control scheduling for your tasks or actors.
66+
67+
Use custom resources when you need to manage scheduling using numeric values. If you need simple label-based scheduling, use labels instead. See :doc:`labels`.
7568

7669
.. _specify-node-resources:
7770

0 commit comments

Comments
 (0)