Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xray] Fix heartbeat subscription for autoscaler #2498

Merged
merged 20 commits into from
Jul 28, 2018
Prev Previous commit
Next Next commit
wip
  • Loading branch information
ericl committed Jul 25, 2018
commit 38d9f368a88013802cd7723ecd6f7a2e99a133f4
8 changes: 6 additions & 2 deletions python/ray/autoscaler/autoscaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,9 +480,13 @@ def recover_if_needed(self, node_id):
return
last_heartbeat_time = self.load_metrics.last_heartbeat_time_by_ip.get(
self.provider.internal_ip(node_id), 0)
if time.time() - last_heartbeat_time < AUTOSCALER_HEARTBEAT_TIMEOUT_S:
delta = time.time() - last_heartbeat_time
if delta < AUTOSCALER_HEARTBEAT_TIMEOUT_S:
return
print("StandardAutoscaler: Restarting Ray on {}".format(node_id))
print(
"StandardAutoscaler: No heartbeat from node "
"{} in {} seconds, restarting Ray to recover".format(
node_id, delta))
updater = self.node_updater_cls(
node_id,
self.config["provider"],
Expand Down
1 change: 1 addition & 0 deletions python/ray/monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,7 @@ def process_messages(self, max_messages=10000):
# Parse the message.
channel = message["channel"]
data = message["data"]
print("MESSAGE", channel, data)

# Determine the appropriate message handler.
message_handler = None
Expand Down