-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-1411. Alleviate slow follower issue #508
Conversation
This patch is a mixed patch. I will upload a clean patch later. |
120657f
to
7857dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clever idea to limit the slowest follower. Thanks.
This feature should be disabled by default. Note that setting it to 1 does not disable the feature since the leader will wait the slowest follower to commit new requests. In other words. when a server is dead, no more transection can be committed.
@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) { | |||
static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) { | |||
setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L)); | |||
} | |||
|
|||
String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename it to ".follower.gap.ratio.max".
@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) { | |||
static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) { | |||
setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L)); | |||
} | |||
|
|||
String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio"; | |||
float FOLLOWER_MAX_GAP_RATIO_DEFAULT = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use double instead of float. We should avoid using float, especially in 64-bit computers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Except for that double is more precise than float, any major drawbacks of using float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precision is the main reason.
Also, double is the default in Java. For example, the code below does not compile.
float a = 1.2;
Using float is easier to make mistakes unawarely.
Unless space is an issue, we should just use double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szetszwo , thanks for the detail info.
@@ -269,6 +272,13 @@ boolean removeAll(Collection<LogAppender> c) { | |||
this.pendingRequests = new PendingRequests(server.getMemberId(), properties, raftServerMetrics); | |||
this.watchRequests = new WatchRequests(server.getMemberId(), properties); | |||
this.messageStreamRequests = new MessageStreamRequests(server.getMemberId()); | |||
this.maxPendingRequests = RaftServerConfigKeys.Write.elementLimit(properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change maxPendingRequests to a local variable.
Thanks @ChenSammi for the patch. This should be disabled by default in ratis as @szetszwo suggested, The only problem i think what we will run into eventually will be : If a slow follower is just not able to catch up with the leader and exceeds the threshold, the leader itself will not be able to apply the transactions whatever have already been appended to its own log . In such cases, probably, it should be able to remove the node and apply the pending transactions which have been appended on the raft log,. Also, a degenerate case will be where we are not updating the majority index to the max but the min of the nodes, but, the leader itself will be accepting accepting transactions till the pending request limit is reached. The other approach can be to not remove the entry from the cache aggressively once applyTransaction is called in Ozone. it should be available on the data cache as long as it is within the threshold of difference between majority and min index The Once it exceeds the threshold , the pipeline can be closed. |
Thanks @szetszwo and @bshashikant for giving the feedback. Add a switch to turn on/off this feature is very good point. For those ratis client who doesn't need WATCH for ALL_COMMITTED, and doesn't have large statemachine data to write/read, there will be no severe slow follower issue. For example, current OM and SCM HA. |
If the slow follower is alive and function well, it will eventually catch up when the leader and another follower are waiting for it. If the slow follower doesn't response, it will eventually trigger the pipeline close after "raft.server.rpc.slowness.timeout" timeout(300s in Ozone).
I opened a Ozone ticket for the cache improvement HDDS-5791. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
@bshashikant , do you want to take a look at the new change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChenSammi for the patch and @szetszwo for the review.
This PR would also affect LeaderStateImpl#commitIndexChanged. I feel we do not need the follower gap limitation for updating the watch requests. |
Good catch. Do you think keeping some cache for the slowest follower will work? |
Thanks @szetszwo and @bshashikant for the code review. |
I think there are a couple of problems with having two caches.
|
@lokeshj1703 Yes. I was aware of the same problems. Seems we can't set the priorities just by cache.
+1 on this. |
https://issues.apache.org/jira/browse/RATIS-1411