Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Note that the YAML file in this example uses `serveConfigV2`. You need KubeRay v

```sh
# Create a RayService
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.mobilenet.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-service.mobilenet.yaml
```

* The [mobilenet.py](https://github.com/ray-project/serve_config_examples/blob/master/mobilenet/mobilenet.py) file needs `tensorflow` as a dependency. Hence, the YAML file uses `rayproject/ray-ml` image instead of `rayproject/ray` image.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ The KubeRay operator Pod must be on the CPU node if you have set up the taint fo

## Step 2: Submit the RayJob

Create the RayJob custom resource with [ray-job.batch-inference.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-job.batch-inference.yaml).
Create the RayJob custom resource with [ray-job.batch-inference.yaml](https://github.com/ray-project/kuberay/blob/v1.5.1/ray-operator/config/samples/ray-job.batch-inference.yaml).

Download the file with `curl`:

```bash
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.batch-inference.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.batch-inference.yaml
```

Note that the `RayJob` spec contains a spec for the `RayCluster`. This tutorial uses a single-node cluster with 4 GPUs. For production use cases, use a multi-node cluster where the head node doesn't have GPUs, so that Ray can automatically schedule GPU workloads on worker nodes which won't interfere with critical Ray processes on the head node.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ kind create cluster --image=kindest/node:v1.26.0
```sh
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
# Install both CRDs and KubeRay operator v1.5.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
# Install both CRDs and KubeRay operator v1.5.1.
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1
```

### Method 2: Kustomize

```sh
# Install CRD and KubeRay operator.
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.5.0"
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.5.1"
```

## Step 3: Validate Installation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Once the KubeRay operator is running, you're ready to deploy a RayCluster. Creat

```sh
# Deploy a sample RayCluster CR from the KubeRay Helm chart repo:
helm install raycluster kuberay/ray-cluster --version 1.5.0
helm install raycluster kuberay/ray-cluster --version 1.5.1
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ To understand the following content better, you should understand the difference
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional, added in version 1.2.0): Set this environment variable for the KubeRay operator, not the RayJob resource. If you set this environment variable to true, the RayJob custom resource itself is deleted if you also set `shutdownAfterJobFinishes` to true. Note that KubeRay deletes all resources created by the RayJob, including the Kubernetes Job.
* Others
* `suspend` (Optional): If `suspend` is true, KubeRay deletes both the RayCluster and the submitter. Note that Kueue also implements scheduling strategies by mutating this field. Avoid manually updating this field if you use Kueue to schedule RayJob.
* `deletionStrategy` (Optional, alpha in v1.5.0): Configures automated cleanup after the RayJob reaches a terminal state. This field requires the `RayJobDeletionPolicy` feature gate to be enabled. Two mutually exclusive styles are supported:
* `deletionStrategy` (Optional, alpha in v1.5.1): Configures automated cleanup after the RayJob reaches a terminal state. This field requires the `RayJobDeletionPolicy` feature gate to be enabled. Two mutually exclusive styles are supported:
* **Rules-based** (Recommended): Define `deletionRules` as a list of deletion actions triggered by specific conditions. Each rule specifies:
* `policy`: The deletion action to perform — `DeleteCluster` (delete the entire RayCluster and its Pods), `DeleteWorkers` (delete only worker Pods), `DeleteSelf` (delete the RayJob and all associated resources), or `DeleteNone` (no deletion).
* `condition`: When to trigger the deletion, based on `jobStatus` (`SUCCEEDED` or `FAILED`) and an optional `ttlSeconds` delay.
Expand All @@ -96,7 +96,7 @@ Follow the [KubeRay Operator Installation](kuberay-operator-deploy) to install t
## Step 3: Install a RayJob

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.sample.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.sample.yaml
```

## Step 4: Verify the Kubernetes cluster status
Expand Down Expand Up @@ -163,13 +163,13 @@ The Python script `sample_code.py` used by `entrypoint` is a simple Ray script t
## Step 6: Delete the RayJob

```sh
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.sample.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.sample.yaml
```

## Step 7: Create a RayJob with `shutdownAfterJobFinishes` set to true

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.shutdown.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.shutdown.yaml
```

The `ray-job.shutdown.yaml` defines a RayJob custom resource with `shutdownAfterJobFinishes: true` and `ttlSecondsAfterFinished: 10`.
Expand Down Expand Up @@ -197,7 +197,7 @@ kubectl get raycluster

```sh
# Step 10.1: Delete the RayJob
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.shutdown.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.shutdown.yaml

# Step 10.2: Delete the KubeRay operator
helm uninstall kuberay-operator
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## Prerequisites

This guide mainly focuses on the behavior of KubeRay v1.5.0 and Ray 2.46.0.
This guide mainly focuses on the behavior of KubeRay v1.5.1 and Ray 2.46.0.

## What's a RayService?

Expand Down Expand Up @@ -35,7 +35,7 @@ Note that the YAML file in this example uses `serveConfigV2` to specify a multi-
## Step 3: Install a RayService

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.sample.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-service.sample.yaml
```

## Step 4: Verify the Kubernetes cluster status
Expand Down Expand Up @@ -129,7 +129,7 @@ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:800

```sh
# Delete the RayService.
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.sample.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-service.sample.yaml

# Uninstall the KubeRay operator.
helm uninstall kuberay-operator
Expand Down
16 changes: 8 additions & 8 deletions doc/source/cluster/kubernetes/k8s-ecosystem/ingress.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ Four examples show how to use ingress to access your Ray cluster:
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.5.0
helm install raycluster kuberay/ray-cluster --version 1.5.1

# Step 3: Edit the `ray-operator/config/samples/ray-cluster-alb-ingress.yaml`
#
Expand Down Expand Up @@ -123,10 +123,10 @@ Now run the following commands:
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.5.0
helm install raycluster kuberay/ray-cluster --version 1.5.1

# Step 3: Edit ray-cluster-gclb-ingress.yaml to replace the service name with the name of the head service from the RayCluster. (Output of `kubectl get svc`)

Expand Down Expand Up @@ -186,12 +186,12 @@ kubectl wait --namespace ingress-nginx \
# Step 3: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1

# Step 4: Install RayCluster and create an ingress separately.
# More information about change of setting was documented in https://github.com/ray-project/kuberay/pull/699
# and `ray-operator/config/samples/ray-cluster.separate-ingress.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.separate-ingress.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-cluster.separate-ingress.yaml
kubectl apply -f ray-cluster.separate-ingress.yaml

# Step 5: Check the ingress created in Step 4.
Expand Down Expand Up @@ -230,10 +230,10 @@ kubectl describe ingress raycluster-ingress-head-ingress
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.5.0
helm install raycluster kuberay/ray-cluster --version 1.5.1

# Step 3: Edit the `ray-operator/config/samples/ray-cluster-agc-gatewayapi.yaml`
#
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cluster/kubernetes/k8s-ecosystem/istio.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ In this mode, you _must_ disable the KubeRay init container injection by setting

```bash
# Set ENABLE_INIT_CONTAINER_INJECTION=false on the KubeRay operator.
helm upgrade kuberay-operator kuberay/kuberay-operator --version 1.5.0 \
helm upgrade kuberay-operator kuberay/kuberay-operator --version 1.5.1 \
--set env\[0\].name=ENABLE_INIT_CONTAINER_INJECTION \
--set-string env\[0\].value=false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ kubectl get all -n prometheus-system
* Set `metrics.serviceMonitor.enabled=true` when installing the KubeRay operator with Helm to create a ServiceMonitor that scrapes metrics exposed by the KubeRay operator's service.
```sh
# Enable the ServiceMonitor and set the label `release: prometheus` to the ServiceMonitor so that Prometheus can discover it
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 \
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 \
--set metrics.serviceMonitor.enabled=true \
--set metrics.serviceMonitor.selector.release=prometheus
```
Expand Down Expand Up @@ -104,7 +104,7 @@ curl localhost:8080
* `# HELP`: Describe the meaning of this metric.
* `# TYPE`: See [this document](https://prometheus.io/docs/concepts/metric_types/) for more details.

* Three required environment variables are defined in [ray-cluster.embed-grafana.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.embed-grafana.yaml). See [Configuring and Managing Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) for more details about these environment variables.
* Three required environment variables are defined in [ray-cluster.embed-grafana.yaml](https://github.com/ray-project/kuberay/blob/v1.5.1/ray-operator/config/samples/ray-cluster.embed-grafana.yaml). See [Configuring and Managing Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) for more details about these environment variables.
```yaml
env:
- name: RAY_GRAFANA_IFRAME_HOST
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ You need to have the access to configure Kubernetes control plane to replace the
KubeRay v1.4.0 and later versions support scheduler plugins.

```sh
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=scheduler-plugins
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 --set batchScheduler.name=scheduler-plugins
```

## Step 4: Deploy a RayCluster with gang scheduling
Expand Down
10 changes: 5 additions & 5 deletions doc/source/cluster/kubernetes/k8s-ecosystem/volcano.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ batchScheduler:
* Pass the `--set batchScheduler.name=volcano` flag when running on the command line:
```shell
# Install the Helm chart with the --batch-scheduler=volcano flag
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=volcano
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 --set batchScheduler.name=volcano
```

### Step 4: Install a RayCluster with the Volcano scheduler
Expand All @@ -45,7 +45,7 @@ The RayCluster custom resource must include the `ray.io/scheduler-name: volcano`
```shell
# Path: kuberay/ray-operator/config/samples
# Includes label `ray.io/scheduler-name: volcano` in the metadata.labels
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml
kubectl apply -f ray-cluster.volcano-scheduler.yaml

# Check the RayCluster
Expand Down Expand Up @@ -113,7 +113,7 @@ Next, create a RayCluster with a head node (1 CPU + 2Gi of RAM) and two workers
```shell
# Path: kuberay/ray-operator/config/samples
# Includes the `ray.io/scheduler-name: volcano` and `volcano.sh/queue-name: kuberay-test-queue` labels in the metadata.labels
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.volcano-scheduler-queue.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-cluster.volcano-scheduler-queue.yaml
kubectl apply -f ray-cluster.volcano-scheduler-queue.yaml
```

Expand Down Expand Up @@ -327,12 +327,12 @@ kubectl delete queue kuberay-test-queue

### Use Volcano for RayJob gang scheduling

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob custom resources.
Starting with KubeRay 1.5.1, KubeRay supports gang scheduling for RayJob custom resources.

First, create a queue with a capacity of 4 CPUs and 6Gi of RAM and RayJob a with a head node (1 CPU + 2Gi of RAM), two workers (1 CPU + 1Gi of RAM each) and a submitter pod (0.5 CPU + 200Mi of RAM), for a total of 3500m CPU and 4296Mi of RAM

```shell
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml
kubectl apply -f ray-job.volcano-scheduler-queue.yaml
```

Expand Down
4 changes: 2 additions & 2 deletions doc/source/cluster/kubernetes/k8s-ecosystem/yunikorn.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ See [Get Started](https://yunikorn.apache.org/docs/) for Apache YuniKorn install
When installing KubeRay operator using Helm, pass the `--set batchScheduler.name=yunikorn` flag at the command line:

```shell
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=yunikorn
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 --set batchScheduler.name=yunikorn
```

## Step 4: Use Apache YuniKorn for gang scheduling

This example demonstrates gang scheduling of RayCluster custom resources with Apache YuniKorn and KubeRay. Starting with KubeRay 1.5.0, KubeRay also supports gang scheduling for RayJob custom resources.
This example demonstrates gang scheduling of RayCluster custom resources with Apache YuniKorn and KubeRay. Starting with KubeRay 1.5.1, KubeRay also supports gang scheduling for RayJob custom resources.

First, create a queue with a capacity of 4 CPUs and 6Gi of RAM by editing the ConfigMap:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ spec:
- "10"
```

You can also use the following command for kuberay version >= 1.5.0:
You can also use the following command for kuberay version >= 1.5.1:

```bash
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set reconcileConcurrency=10
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 --set reconcileConcurrency=10
```
Loading