-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update last heartbeat time only if slave is alive #442
Conversation
Verified that @mayurva has signed the CLA. Thanks for the pull request! |
app/master/cluster_master.py
Outdated
""" | ||
Updates last heartbeat time for the slave if it is alive. | ||
:type slave: Slave | ||
:return: A boolean indicating whether last heartbeat time was called or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool
is not needed in the doc since you already have the return type above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I just wanted to explicitly call out what the return value should be interpreted as
app/master/cluster_master.py
Outdated
:type slave: Slave | ||
:return: A boolean indicating whether last heartbeat time was called or not | ||
""" | ||
slave_alive = slave.is_alive() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should set and return a value in this function. Maybe there is some other place to do this that is cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the method implementation was a bit confusing. Changed it to be more clear.
@@ -431,4 +431,5 @@ class _SlavesHeartbeatHandler(_ClusterMasterBaseAPIHandler): | |||
@authenticated | |||
def post(self, slave_id): | |||
slave = SlaveRegistry.singleton().get_slave(slave_id=int(slave_id)) | |||
self._cluster_master.update_slave_last_heartbeat_time(slave) | |||
if not self._cluster_master.update_slave_last_heartbeat_time(slave): | |||
self._write_status(status_code=403) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this verses setting is_alive
back to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems cleaner. We don't necessarily know at this point why the slave was marked dead before. Perhaps the intention was in fact to kill the slave process, but the network delays caused the message to get lost. An error response ensures that the slave process will restart and reconnect to the master in a clean state.
466ecbd
to
aa27b06
Compare
app/master/cluster_master.py
Outdated
@@ -196,7 +196,6 @@ def update_slave_last_heartbeat_time(self, slave) -> bool: | |||
""" | |||
Updates last heartbeat time for the slave if it is alive. | |||
:type slave: Slave | |||
:return: A boolean indicating whether last heartbeat time was called or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean remove the line, just the duplicated type info. It should like:
:return: True when the slave is alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I totally misunderstood that.
app/slave/cluster_slave.py
Outdated
@@ -81,6 +81,12 @@ def _run_heartbeat(self): | |||
self._logger.error('Received HTTP {} from master.'.format(response.status_code)) | |||
self._logger.error('The slave process is shutting down.') | |||
self.kill() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add space between if/else
app/slave/cluster_slave.py
Outdated
|
||
# is_alive = False indicates slave is marked offline in master | ||
elif not response.json().get('is_alive'): | ||
self._logger.error('The slave is marked dead by master.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is official convention in CR, but I normally use "
strings for sentences and '
for single words or letters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. All the other logger lines in this file have single quotes. So, I guess I'll stick to that
response = { | ||
'is_alive': is_alive | ||
} | ||
self.write(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No consistent style in this module for response
, but I guess I prefer the compact version being used in some cases:
self.write({'is_alive': is_alive})
If you want to stick with above then add trailing comma after is_alive
app/master/cluster_master.py
Outdated
@@ -196,7 +196,6 @@ def update_slave_last_heartbeat_time(self, slave) -> bool: | |||
""" | |||
Updates last heartbeat time for the slave if it is alive. | |||
:type slave: Slave | |||
:return: A boolean indicating whether last heartbeat time was called or not | |||
""" | |||
if slave.is_alive(): | |||
slave.update_last_heartbeat_time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code was actually returning true
when the hearbeat time is updated then it could be simplified to:
return slave.is_alive() and slave.update_last_hearbeat_time()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm.. not really. update_last_heartbeat_time
does not return a value.
84fc2d4
to
7c2f460
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
If the slave has been marked dead, but still sends heartbeat, the master does not update the last heartbeat time. At the same time, master responds to the slave indicating that it has been marked dead. The slave considers this as heartbeat failure and dies.
If the slave has been marked alive, but still sends heartbeat, the
master should not update the last heartbeat time. At the same time,
master should respond to the slave indicating that it has been marked
dead.