ringpop: update hashring immediately on ring change #3130

venkat1109 · 2020-03-24T00:35:58Z

What changed?
When a cadence host is added/removed/restarted, every other host in the cluster will receive a change notification via ringpop. This is the primary mechanism by which discovery / failure detection works today. When such a change notification is received, every node updates its consistent hash ring to route future requests to the correct owner. Its critical that the hashring update happens as soon as possible during deployments / restarts etc to avoid downtime / availability drops. Currently, there is an optimization to avoid too many updates to hashring within a short span of time. But this is hurting availability.

This patch adds a fix by updating ring as soon as notification is received. In addition, a dedup map is added to resolver to avoid updating the ring when (a) nothing changes on an event (b) the host added or removed is for a different role. This should mitigate the too many updates within a short span of time problem.

Why?
To reduce availability dips during deployments and host restarts.

How did you test it?
Localhost as well as in a staging environment.

Potential risks
In the worst case, discovery / failure detection can be broken. This would mean unavailability or host stealing shards from each other continuously.

coveralls · 2020-03-24T00:56:51Z

Coverage increased (+0.4%) to 67.471% when pulling 36f42ea on venkat1109:v_rp_fixes into 8bcbb4f on uber:master.

common/membership/rpServiceResolver.go

venkat1109 self-assigned this Mar 24, 2020

venkat1109 marked this pull request as ready for review March 24, 2020 04:10

venkat1109 requested review from emrahs, a team and vitarb March 24, 2020 04:11

wxing1292 reviewed Mar 24, 2020

View reviewed changes

common/membership/rpServiceResolver.go Show resolved Hide resolved

venkat1109 force-pushed the v_rp_fixes branch from 7082766 to 74ce153 Compare March 24, 2020 17:08

vancexu approved these changes Mar 24, 2020

View reviewed changes

vancexu reviewed Mar 24, 2020

View reviewed changes

common/membership/rpServiceResolver.go Outdated Show resolved Hide resolved

venkat1109 added 3 commits March 24, 2020 18:32

ringpop: update hashring immediately on ring change

b540ddf

add break on compareMembers

5f4bf98

address cr comments

36f42ea

venkat1109 force-pushed the v_rp_fixes branch from 74ce153 to 36f42ea Compare March 25, 2020 01:32

venkat1109 merged commit 31d2619 into uber:master Mar 25, 2020

venkat1109 deleted the v_rp_fixes branch March 25, 2020 02:05

venkat1109 added a commit that referenced this pull request Mar 27, 2020

ringpop: update hashring immediately on ring change (#3130)

2ecd400

venkat1109 added a commit that referenced this pull request Mar 27, 2020

ringpop: update hashring immediately on ring change (#3130)

fdf39c7

yux0 pushed a commit that referenced this pull request Apr 14, 2020

ringpop: update hashring immediately on ring change (#3130)

81e62c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ringpop: update hashring immediately on ring change #3130

ringpop: update hashring immediately on ring change #3130

venkat1109 commented Mar 24, 2020 •

edited

Loading

coveralls commented Mar 24, 2020 •

edited

Loading

ringpop: update hashring immediately on ring change #3130

ringpop: update hashring immediately on ring change #3130

Conversation

venkat1109 commented Mar 24, 2020 • edited Loading

coveralls commented Mar 24, 2020 • edited Loading

venkat1109 commented Mar 24, 2020 •

edited

Loading

coveralls commented Mar 24, 2020 •

edited

Loading