Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ringpop: update hashring immediately on ring change #3130

Merged
merged 3 commits into from
Mar 25, 2020

Conversation

venkat1109
Copy link
Contributor

@venkat1109 venkat1109 commented Mar 24, 2020

What changed?
When a cadence host is added/removed/restarted, every other host in the cluster will receive a change notification via ringpop. This is the primary mechanism by which discovery / failure detection works today. When such a change notification is received, every node updates its consistent hash ring to route future requests to the correct owner. Its critical that the hashring update happens as soon as possible during deployments / restarts etc to avoid downtime / availability drops. Currently, there is an optimization to avoid too many updates to hashring within a short span of time. But this is hurting availability.

This patch adds a fix by updating ring as soon as notification is received. In addition, a dedup map is added to resolver to avoid updating the ring when (a) nothing changes on an event (b) the host added or removed is for a different role. This should mitigate the too many updates within a short span of time problem.

Why?
To reduce availability dips during deployments and host restarts.

How did you test it?
Localhost as well as in a staging environment.

Potential risks
In the worst case, discovery / failure detection can be broken. This would mean unavailability or host stealing shards from each other continuously.

@venkat1109 venkat1109 self-assigned this Mar 24, 2020
@coveralls
Copy link

coveralls commented Mar 24, 2020

Coverage Status

Coverage increased (+0.4%) to 67.471% when pulling 36f42ea on venkat1109:v_rp_fixes into 8bcbb4f on uber:master.

@venkat1109 venkat1109 marked this pull request as ready for review March 24, 2020 04:10
@venkat1109 venkat1109 requested review from emrahs, a team and vitarb March 24, 2020 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants