reconnecting cluster slave nodes to the head node fails

Steps to reproduce:
Start head node: 
`scripts/start_ray.sh --num-cpus 100 --num-gpus 0 --num-workers 100 --head`
Login to the second (slave) node and start ray, pointing to the head node:
`./scripts/start_ray.sh --redis-address <headnode_ip:redis_port>`
Then stop Ray on the slave node : 
`./scripts/stop_ray.sh `

And now try to start Ray on the slave node again: 
```
./scripts/start_ray.sh --redis-address <headnode_ip:redis_port>
Waiting for redis server at <headnode_ip:redis_port> to respond...
Using IP address ####### for this node.
Traceback (most recent call last):
  File "/data/atumanov/ray/scripts/start_ray.py", line 109, in <module>
    check_no_existing_redis_clients(node_ip_address, args.redis_address)
  File "/data/atumanov/ray/scripts/start_ray.py", line 34, in check_no_existing_redis_clients
    raise Exception("This Redis instance is already connected to clients with this IP address.")
Exception: This Redis instance is already connected to clients with this IP address.
```

Takeaway: starting and stopping Ray on slave nodes is not idempotent and it should be.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reconnecting cluster slave nodes to the head node fails #336

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reconnecting cluster slave nodes to the head node fails #336

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions