Skip to content

The HandleAERequest method incorrectly sets the term value in the AEResponse message. #55

Open
@liang636600

Description

@liang636600

Hi, @wenweihu86 @loveheaven @guohao @wangwg1 , I have discovered that in certain situations, when a follower node processes an AppendEntriesRequest message (abbreviated as AEReq) and generates an AppendEntriesResponse message (abbreviated as AERes), it incorrectly sets the term value in the AERes message. The specific processing logic corresponds to the source code function com.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl#appendEntries. In the following sections, I will provide a detailed explanation of my findings.

How to trigger this bug

As shown in Figure 1, triggering this bug requires a three-node cluster (n1, n2, n3), where n3 is not explicitly represented in the diagram. During the election phase, n1 first times out, causing its term to increase to 1. It then sends a vote request to n3. Upon receiving the request, n3 grants its vote to n1. Finally, after receiving the vote from n3, n1 becomes the leader. Notably, during this phase, n2 neither receives any vote requests from n1 nor experiences a timeout, so its term remains 0.

Subsequently, n1 receives a client request and appends the value 5 to its log (as illustrated by Action 1 in Figure 1). Then, through Action 2, n1 sends an AppendEntries request (abbreviated as AEReq) to n2, with the message contents detailed in the right-side table under the row corresponding to Action 2. Finally, in Action 3, n2 processes the AEReq message via HandleAEReq and generates an AppendEntriesResponse message (abbreviated as AERes), with the message contents shown in the row for Action 3 in the table.

However, the term value in the AERes message generated by n2 is incorrect(As shown in the red-highlighted section of Figure 1). While processing the AEReq message, n2 updates its term to 1, yet it still uses the old term value of 0 in the final AERes message. This behavior contradicts the Raft paper’s description of the term value in AEResponse messages (as depicted in Figure 2).

Figure 1. Incorrect Term Value in AERes Message Generated by Follower Node.

Figure 2. Term Value Specification for AERes Messages in the Raft Paper.

Suggested fix

The root cause of this bug is that after the follower node updates its own term value, it fails to promptly update the term value in the AERes message, resulting in an outdated term being sent.

Once the root cause is identified, fixing this bug is straightforward. The solution simply requires adding a line of code to update the term value in the AERes message immediately after the node updates its own term using the raftNode.stepDown(request.getTerm()); method. Specifically, adding responseBuilder.setTerm(raftNode.getCurrentTerm()); ensures that the AERes message reflects the correct term value.

Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions