This test assesses Serve's availability under hostile conditions– where nodes are intentionally killed periodically.
Receiver
service- 3 Ray cluster nodes
- 1 head node, 2 worker nodes
- Each node must have the following custom resources:
- 1
"alpha_singleton"
custom resource - 1
"beta_singleton"
custom resource
- 1
- 3
Receiver
replicas- Trivial workload (returns a string or asks
NodeKiller
to kill a node) - Assigns one replica per node using
custom_resources
- Prevents all replicas from getting stuck on one node
- Trivial workload (returns a string or asks
- 1
NodeKiller
replica- Kills its node with either
ray stop --head
orsudo halt --force
- Kills its node with either
- 3 Ray cluster nodes
Pinger
service- 1 Ray cluster node (just the head node)
- 1
Pinger
replica- Sends requests to the
Receiver
service at a constant QPS
- Sends requests to the
- 1
Reaper
replica- Periodically sends a request to the
Receiver
service asking theNodeKiller
to kill a node
- Periodically sends a request to the
- 1
ReceiverHelmsman
replica- Periodically upgrades the
Receiver
service with a newimport_path
. This changes the string that theReceiver
replica returns and the specific type of custom resource that theReceiver
replicas use. - Watches the
Receiver
service's status.
- Periodically upgrades the
- Launch the
Receiver
service. Its Serve config is inreceiver_config.yaml
. - Get the
Receiver
service's URL and any authentication token needed to access it. - Fill in
pinger_config.yaml
with theReceiver
service's info. You can omit the authentication token in the config if yourReceiver
doesn't need one. - Launch the
Pinger
service usingpinger_config.yaml
. - Import the Grafana dashboard from
grafana_dashboard.json
if you're running a Grafana server. - You can check
Pinger
's metrics either through the Grafana dashboard or by sending aGET
request to thePinger
service's/info
endpoint.