Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: consolidate all append message sending #134

Closed
wants to merge 3 commits into from

Conversation

pav-kv
Copy link
Contributor

@pav-kv pav-kv commented Jan 25, 2024

This PR consolidates all decision-making about sending append messages into a single maybeSendAppend method. Previously, the behaviour depended on the sendIfEmpty flag which was set/unset depending on the circumstances in which the method is called. This is unnecessary because the Progress struct contains enough information about the leader->follower flow state, so maybeSendAppend can be made stand-alone.

In a follow-up PR, the consolidated maybeSendAppend method will be used to implement a more flexible message flow control.

Part of #130

@pav-kv pav-kv force-pushed the refactor-send-append branch 7 times, most recently from eeefed5 to 9cc9344 Compare January 25, 2024 18:12
@pav-kv pav-kv force-pushed the refactor-send-append branch from 9cc9344 to 580374c Compare February 8, 2024 11:41
@pav-kv pav-kv changed the title raft: refactor maybeSendAppend raft: consolidate all append message sending Feb 8, 2024
@pav-kv pav-kv force-pushed the refactor-send-append branch from 580374c to b4da150 Compare February 8, 2024 11:59
tracker/progress.go Outdated Show resolved Hide resolved
tracker/progress.go Outdated Show resolved Hide resolved
tracker/progress.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated
Type: pb.MsgApp,
Index: pr.Next - 1,
LogTerm: prevTerm,
Commit: r.raftLog.committed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing the odd asymmetry between MsgApp sending the latest Commit index and the follower computing min(committed, lastnewi), and MsgHeartbeat carefully sending min(pr.Match, r.raftLog.committed) to avoid a follower-side panic. Do you have thoughts about that?

Copy link
Contributor Author

@pav-kv pav-kv Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The commit index in MsgApp is "optimistic", and can be larger than pr.Match. As a result, a follower can simultaneously append and commit entries. Technically, Commit index can even exceed the last index of the entries in the batch. The way I think about MsgApp is that it's effectively 2 independent messages:

  1. A logSlice of the leader log to append to the follower log.
  2. A (term, commit) is the point in the leader@term log that is committed (see The notion of Term is overloaded #144 explaining the "leader log" coordinate system).

As such, even if the append (1) is redundant/no-op (because the follower already has these entries from elsewhere), the commit index (2) can still communicate valuable information - if the follower log is at the same term, it can advance the commit index safely. We don't take advantage of this currently (the commit index is advanced only within the bounds of the appended entries (1)), but we should.

The commit index in MsgHeartbeat is "pessimistic" and cut at pr.Match because currently the follower has a panic if the index is not in the bounds of the follower log. This pessimistic cut and panic are unnecessary though, see #138 and #139 aiming to relax it.

I would like to clean-up the semantics of the Commit indices in both messages, to both be treated as (2) above - a (term, index) that the leader sends without relying on the knowledge about the follower log. The follower log has enough knowledge to apply (or ignore) it safely.

tracker/progress.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the refactor-send-append branch 3 times, most recently from 693ed7b to 2249aff Compare February 23, 2024 13:25
tracker/progress.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the refactor-send-append branch 2 times, most recently from 5cc41de to 017bdda Compare February 23, 2024 17:16
@pav-kv pav-kv marked this pull request as ready for review February 23, 2024 18:27
@pav-kv pav-kv force-pushed the refactor-send-append branch from 017bdda to 34d67b7 Compare February 26, 2024 13:16
@pav-kv
Copy link
Contributor Author

pav-kv commented Feb 26, 2024

@ahrtr PTAL. Also asking @bdarnell to review (@nvanbenschoten and @erikgrinaker can't look at at it this week).

@bdarnell
Copy link
Contributor

Looks reasonable to me, although I'm very far removed from this code and I'm not sure I'm qualified to give it a thorough review. In particular I can see from the tests that there are behavior changes here, but I can't say for sure whether they're all safe.

One thing this highlights for me is how the piggybacking of (unacknowledged) commit indexes onto MsgApp is a source of trouble. I've been tempted in the past to introduce something like a new pair of messages MsgCommitIndex/MsgCommitIndexResp so the leader always knows the commit index that the follower has acknowledged, and we can reduce some of the special cases around MsgApp. I don't know if the time is right for that but it's something to consider as we're improving flow control here.

@ahrtr
Copy link
Member

ahrtr commented Feb 26, 2024

Will take a closer look this week.

A generic comment, in order to make the review easier, suggest to breakdown this PR. Such as,

  • get code refactor, method/function renaming, comment updating etc. in one PR, so that they can be reviewed and merged asap.
  • If you change any logic or adding any new field, then it needs super careful review, please try to make such PR as small as possible (e.g. < 100 lines of code change).

@pav-kv
Copy link
Contributor Author

pav-kv commented Feb 26, 2024

@bdarnell We have an issue for the commit index acks you described: #131. And a draft PR for it: #132. It's not necessary for the flow control change here though.

tracker/progress.go Outdated Show resolved Hide resolved
}
pr.PauseMsgAppProbes(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it changes the original logic. In previous implementation, if pr.State == StateReplicate && !pr.Inflights.Full() is true, then it will not pause the flow. Your new implementation will pause the flow in this case. Is it an intentional change?

Copy link
Contributor Author

@pav-kv pav-kv Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR modifies the meaning of the MsgAppFlowPaused bool, but the visible behaviour of MsgApp sends is not changed.

With this PR, the pr.Inflights.Full() check just moved to a different place: see the CanSendEntries and ShouldSendAppend funcs.

It's not super intentional. I tried a few different ways, and didn't find a way that doesn't modify the meaning of this bool at all. This bool is not used by user code though, and we should unexport it IMO.

Comment on lines +303 to +307
return pr.CanBumpCommit(commit) ||
pr.Match < last && (!pr.MsgAppProbesPaused || pr.CanSendEntries(last))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is a little over complicated. Is it possible that both pr.MsgAppProbesPaused and pr.CanSendEntries(last) are true? ShouldSendMsgApp may still return true in such case?

Copy link
Contributor Author

@pav-kv pav-kv Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible, we set MsgAppProbesPaused after every message send for instance. MsgAppProbesPaused == true means: if we are in the probing/throttled mode, don't send a message. But if we are not in this mode (CanSendEntries is true) then MsgAppProbesPaused is effectively ignored.

We could be keeping MsgAppProbesPaused in sync with CanSendEntries, but it would require some additional invariants.

raft.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the refactor-send-append branch 2 times, most recently from 1a2c70f to 3a94256 Compare February 27, 2024 19:42
@pav-kv pav-kv force-pushed the refactor-send-append branch from 3a94256 to 3f9b634 Compare March 6, 2024 14:52
pav-kv added 3 commits March 11, 2024 11:37
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
This commit consolidates all decision-making about sending append
messages into a single maybeSendAppend method. Previously, the behaviour
depended on the sendIfEmpty flag which was set/unset depending on the
circumstances in which the method is called. This is unnecessary because
the Progress struct contains enough information about the
leader->follower flow state, so maybeSendAppend can be made stand-alone.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
@pav-kv pav-kv force-pushed the refactor-send-append branch from 3f9b634 to 91981c3 Compare March 11, 2024 11:37
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request May 29, 2024
124006: raft: consolidate all append message sending r=nvanbenschoten a=pav-kv

This PR consolidates all decision-making about sending append messages into a single `maybeSendAppend` method. Previously, the behaviour depended on the `sendIfEmpty` flag which was set/unset depending on the context in which the method is called. This is unnecessary because the `Progress` struct contains enough information about the leader->follower flow state, so `maybeSendAppend` can be made stand-alone.

In follow-up PRs, the consolidated `maybeSendAppend` method will be used to implement a more flexible message flow control.

Ported from etcd-io/raft#134

Epic: CRDB-37515
Release note: none

Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
@pav-kv
Copy link
Contributor Author

pav-kv commented May 29, 2024

Closing this PR, since it has been superseded by cockroachdb/cockroach#124006 in CRDB.

@pav-kv pav-kv closed this May 29, 2024
@pav-kv pav-kv deleted the refactor-send-append branch May 29, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants