Skip to content

[Zen2] Introduce PreVoteCollector #32847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 20, 2018

Conversation

DaveCTurner
Copy link
Contributor

An election requires a node to select a term that is higher than all
previously-seen terms. If nodes are too enthusiastic about starting elections
then they can effectively excludes itself from the cluster until the leader can
bump to a still-higher term, and if this process repeats then a single faulty
node can prevent the cluster from making useful progress.

The solution is to start the election with a pre-voting round to ensure that
there is at least a quorum of nodes who believe there to be no leader.

An election requires a node to select a term that is higher than all
previously-seen terms.  If nodes are too enthusiastic about starting elections
then they can effectively excludes itself from the cluster until the leader can
bump to a still-higher term, and if this process repeats then a single faulty
node can prevent the cluster from making useful progress.

The solution is to start the election with a pre-voting round to ensure that
there is at least a quorum of nodes who believe there to be no leader.
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Aug 14, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch August 14, 2018 14:24
@ywelsch ywelsch mentioned this pull request Aug 14, 2018
61 tasks
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some initial feedback. I'm missing a test that checks that a subsequent election (simulated on coordinationstate) will succeed given a successful prevoting round.


final long currentMaxTermSeen = updateMaxTermSeen(response.getCurrentTerm());

final PreVoteResponse currentPreVoteResponse = preVoteResponse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should capture this in the PreVotingRound constructor. This should very rarely be changing (similarly to cluster state) if there is an ongoing prevoting round. This makes this instance also internally more consistent. Finally, I would like to see if we can make this inner class static, so that we know exactly about all the dependencies it has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the current (i.e. last-accepted) cluster state for this, I think.

The inner class uses these from the outer class:

  • logger
  • transportService
  • startElection.run()
  • toString()
  • preVoteResponse.getCurrentTerm() (in the constructor)

I got rid of the getCurrentTerm() call in the constructor, but I would prefer to use the others as-is rather than making it static.


public static final String REQUEST_PRE_VOTE_ACTION_NAME = "internal:cluster/request_pre_vote";

private final AtomicLong maxTermSeen = new AtomicLong(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should leave this out of the PR for now. I'm expecting this "maxTermSeen" thingy to appear in more components, and would like to treat it uniformly (maybe just a callback).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


void start(final Iterable<DiscoveryNode> broadcastNodes) {

final boolean isRunningChanged = isRunning.compareAndSet(false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this becomes obsolete by changing to isClosed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

final long currentMaxTermSeen = updateMaxTermSeen(response.getCurrentTerm());

final PreVoteResponse currentPreVoteResponse = preVoteResponse;
if (response.getLastAcceptedTerm() > currentPreVoteResponse.getLastAcceptedTerm()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do something similar as for the isElectionQuorum trick where we share implementation of these checks with CoordinationState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do so. The machinery to share these few lines of code obscures what they do, and the extra abstraction in CoordinationState takes it further away from the formal model.

import static org.elasticsearch.cluster.coordination.CoordinationState.isElectionQuorum;
import static org.elasticsearch.common.util.concurrent.ConcurrentCollections.newConcurrentSet;

public class PreVoteCollectorFactory extends AbstractComponent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe omit the "Factory" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@DaveCTurner DaveCTurner requested a review from ywelsch August 15, 2018 08:56
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, looks good o.w.

this.transportService = transportService;
this.startElection = startElection;

transportService.registerRequestHandler(REQUEST_PRE_VOTE_ACTION_NAME, Names.GENERIC, false, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can later on think whether we even need to go on the generic thread pool if there's no blocking action and only cheap computations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 68c4b84 to note this.

* @return the pre-voting round, which can be closed to end the round early.
*/
public Releasable start(final ClusterState clusterState, final Iterable<DiscoveryNode> broadcastNodes) {
logger.debug("{} starting", this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe sufficient to log this in the start method of PreVotingRound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved and expanded in 63b221c

public static final String REQUEST_PRE_VOTE_ACTION_NAME = "internal:cluster/request_pre_vote";

private final TransportService transportService;
private final Runnable startElection;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet if it's sufficient to have a unique runnable for all voting rounds or whether we want specific ones, different for each voting round. We can check these though once we integrate these things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will become clear later...

public void update(final PreVoteResponse preVoteResponse, final DiscoveryNode currentLeader) {
logger.trace("updating with preVoteResponse={}, currentLeader={}", preVoteResponse, currentLeader);
this.preVoteResponse = preVoteResponse;
this.currentLeader = currentLeader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do some lightweight synchronization around these two variables so that you always have a consistent snapshot of those? Alternatively, we could store them in a tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really necessary, but might help a bit. I used a Tuple in 232b4e9

// TODO if we are a leader and the max term seen exceeds our term then we need to bump our term

final DiscoveryNode currentLeader = this.currentLeader;
if (currentLeader == null || currentLeader.equals(request.getSourceNode())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment while we treat the currentLeader.equals(request.getSourceNode()) situation specially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 232b4e9

if (response.getLastAcceptedTerm() > clusterState.term()
|| (response.getLastAcceptedTerm() == clusterState.term()
&& response.getLastAcceptedVersion() > clusterState.version())) {
logger.debug("{} ignoring {} from {} as it is fresher", this, response, sender);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why debug instead of trace logging everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep trace logging for things that are expected to happen in a stable cluster, and anything unexpected to appear at debug or higher. In a stable cluster, any election activity is unexpected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense


@Override
public void close() {
final boolean isClosedChanged = isClosed.compareAndSet(false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name this isNotAlreadyClosed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4f61eff

public Releasable start(final ClusterState clusterState, final Iterable<DiscoveryNode> broadcastNodes) {
logger.debug("{} starting", this);
PreVotingRound currentRound = new PreVotingRound(clusterState, preVoteResponse.getCurrentTerm());
currentRound.start(broadcastNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of the "current"Round.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 4ec40e1

@DaveCTurner DaveCTurner merged commit cd6326b into elastic:zen2 Aug 20, 2018
@DaveCTurner DaveCTurner deleted the 2018-08-14-prevote-collector branch August 20, 2018 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants