Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update last heartbeat time only if slave is alive #442

Merged
merged 1 commit into from
May 17, 2018
Merged

Conversation

mayurva
Copy link
Contributor

@mayurva mayurva commented May 14, 2018

If the slave has been marked alive, but still sends heartbeat, the
master should not update the last heartbeat time. At the same time,
master should respond to the slave indicating that it has been marked
dead.

@boxcla
Copy link

boxcla commented May 14, 2018

Verified that @mayurva has signed the CLA. Thanks for the pull request!

"""
Updates last heartbeat time for the slave if it is alive.
:type slave: Slave
:return: A boolean indicating whether last heartbeat time was called or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is not needed in the doc since you already have the return type above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I just wanted to explicitly call out what the return value should be interpreted as

:type slave: Slave
:return: A boolean indicating whether last heartbeat time was called or not
"""
slave_alive = slave.is_alive()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should set and return a value in this function. Maybe there is some other place to do this that is cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the method implementation was a bit confusing. Changed it to be more clear.

@@ -431,4 +431,5 @@ class _SlavesHeartbeatHandler(_ClusterMasterBaseAPIHandler):
@authenticated
def post(self, slave_id):
slave = SlaveRegistry.singleton().get_slave(slave_id=int(slave_id))
self._cluster_master.update_slave_last_heartbeat_time(slave)
if not self._cluster_master.update_slave_last_heartbeat_time(slave):
self._write_status(status_code=403)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this verses setting is_alive back to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems cleaner. We don't necessarily know at this point why the slave was marked dead before. Perhaps the intention was in fact to kill the slave process, but the network delays caused the message to get lost. An error response ensures that the slave process will restart and reconnect to the master in a clean state.

@mayurva mayurva force-pushed the heartbeat-alive branch 3 times, most recently from 466ecbd to aa27b06 Compare May 16, 2018 01:02
@@ -196,7 +196,6 @@ def update_slave_last_heartbeat_time(self, slave) -> bool:
"""
Updates last heartbeat time for the slave if it is alive.
:type slave: Slave
:return: A boolean indicating whether last heartbeat time was called or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean remove the line, just the duplicated type info. It should like:

:return: True when the slave is alive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I totally misunderstood that.

@@ -81,6 +81,12 @@ def _run_heartbeat(self):
self._logger.error('Received HTTP {} from master.'.format(response.status_code))
self._logger.error('The slave process is shutting down.')
self.kill()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add space between if/else


# is_alive = False indicates slave is marked offline in master
elif not response.json().get('is_alive'):
self._logger.error('The slave is marked dead by master.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is official convention in CR, but I normally use " strings for sentences and ' for single words or letters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. All the other logger lines in this file have single quotes. So, I guess I'll stick to that

response = {
'is_alive': is_alive
}
self.write(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No consistent style in this module for response, but I guess I prefer the compact version being used in some cases:

self.write({'is_alive': is_alive})

If you want to stick with above then add trailing comma after is_alive

@@ -196,7 +196,6 @@ def update_slave_last_heartbeat_time(self, slave) -> bool:
"""
Updates last heartbeat time for the slave if it is alive.
:type slave: Slave
:return: A boolean indicating whether last heartbeat time was called or not
"""
if slave.is_alive():
slave.update_last_heartbeat_time()
Copy link
Contributor

@cmcginty cmcginty May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code was actually returning true when the hearbeat time is updated then it could be simplified to:

return slave.is_alive() and slave.update_last_hearbeat_time()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm.. not really. update_last_heartbeat_time does not return a value.

@mayurva mayurva force-pushed the heartbeat-alive branch 2 times, most recently from 84fc2d4 to 7c2f460 Compare May 17, 2018 02:26
Copy link
Contributor

@cmcginty cmcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

If the slave has been marked dead, but still sends heartbeat, the
master does not update the last heartbeat time. At the same time, master
responds to the slave indicating that it has been marked dead. The slave
considers this as heartbeat failure and dies.
@mayurva mayurva force-pushed the heartbeat-alive branch from 7c2f460 to 3f6d218 Compare May 17, 2018 19:08
@mayurva mayurva merged commit a22c64a into master May 17, 2018
@mayurva mayurva deleted the heartbeat-alive branch May 17, 2018 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants