Fix bug we initiate a backup when the volume has started detaching #2635

PhanLe1010 · 2024-02-23T01:16:41Z

When the volume.Spec.NodeID is different than the node ID of the backup VA ticket, we should not initiate a backup as the volume is going to detach soon

longhorn/longhorn#7937

This PR is going to replace the approach at the PR #2627. After discussed with @ejweber and @james-munson , we think that this approach is better because:

The other approach only touches the logic for the VA ticket of RWO non-migratible volume. It still leaves the logic for the RWX and mitigrable volume the same as we cannot compare the volume.Spec.NodeID with the node ID of these VA tickets. They are most likely to be different
It is fundamentally still correct that when the volume start detaching (but not finish detaching yet) the VA ticket can set to satisfied because at this exact moment the volume is still attached. VA controller can set reset the VA ticket to satisfied:false when the volume finish detaching. It is the responsibility of the client (backup controller in this case) to check and make sure that the volume.Spec.NodeID is still the desired one

When the volume.Spec.NodeID is different than the node ID of the backup VA ticket, we should not initiate a backup as the volume is going to detach soon longhorn-7937 Signed-off-by: Phan Le <phan.le@suse.com>

PhanLe1010 · 2024-02-23T01:54:51Z

Test run:

innobead · 2024-02-23T03:12:59Z

Still need to follow up with the test results later, even though the PR has been merged.

innobead · 2024-02-23T03:13:23Z

@mergify backport v1.6.x v1.5.x

mergify · 2024-02-23T03:13:27Z

backport v1.6.x v1.5.x

✅ Backports have been created

#2637 Fix bug we initiate a backup when the volume has started detaching (backport #2635) has been created for branch v1.6.x
#2638 Fix bug we initiate a backup when the volume has started detaching (backport #2635) has been created for branch v1.5.x

innobead · 2024-02-23T03:15:27Z

The fix is general, so it should be expected to happen in 1.6 and 1.5 after VA was introduced. Do you know why this issue can reproduce after 1.5.4-RC2? @PhanLe1010 @c3y1huang

PhanLe1010 · 2024-02-23T05:43:29Z

Hi @innobead my current theory is that we change the code flow somehow and make it longer between volume.Spec.NodeID being cleanup and the time that volume starts detaching by setting the engine.spec.desirestate to stop. But that is just an idea and I couldn't find proof from the git diff of recent commits. I believe @c3y1huang and @mantissahz already searched the commit history too. We couldn't pin down the exact commit that made the race become more visible

innobead · 2024-02-23T05:51:03Z

That's reasonable and should be fine, as we already clarified the fix here.

Fix bug we initiate a backup when the volume has started detaching

08496ae

When the volume.Spec.NodeID is different than the node ID of the backup VA ticket, we should not initiate a backup as the volume is going to detach soon longhorn-7937 Signed-off-by: Phan Le <phan.le@suse.com>

PhanLe1010 requested a review from a team as a code owner February 23, 2024 01:16

PhanLe1010 mentioned this pull request Feb 23, 2024

Fix a race condition in volumeattachment controller #2627

Closed

c3y1huang approved these changes Feb 23, 2024

View reviewed changes

innobead self-requested a review February 23, 2024 03:12

innobead approved these changes Feb 23, 2024

View reviewed changes

innobead merged commit 586331b into longhorn:master Feb 23, 2024
5 checks passed

This was referenced Feb 23, 2024

Fix bug we initiate a backup when the volume has started detaching (backport #2635) #2637

Merged

Fix bug we initiate a backup when the volume has started detaching (backport #2635) #2638

Merged

c3y1huang mentioned this pull request Feb 23, 2024

[BUG][v1.5.x] Recurring job fails to create backup when volume detached longhorn/longhorn#7937

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug we initiate a backup when the volume has started detaching #2635

Fix bug we initiate a backup when the volume has started detaching #2635

PhanLe1010 commented Feb 23, 2024 •

edited

Loading

PhanLe1010 commented Feb 23, 2024

innobead commented Feb 23, 2024

innobead commented Feb 23, 2024

mergify bot commented Feb 23, 2024 •

edited

Loading

innobead commented Feb 23, 2024

PhanLe1010 commented Feb 23, 2024 •

edited

Loading

innobead commented Feb 23, 2024

Fix bug we initiate a backup when the volume has started detaching #2635

Fix bug we initiate a backup when the volume has started detaching #2635

Conversation

PhanLe1010 commented Feb 23, 2024 • edited Loading

PhanLe1010 commented Feb 23, 2024

innobead commented Feb 23, 2024

innobead commented Feb 23, 2024

mergify bot commented Feb 23, 2024 • edited Loading

✅ Backports have been created

innobead commented Feb 23, 2024

PhanLe1010 commented Feb 23, 2024 • edited Loading

innobead commented Feb 23, 2024

PhanLe1010 commented Feb 23, 2024 •

edited

Loading

mergify bot commented Feb 23, 2024 •

edited

Loading

PhanLe1010 commented Feb 23, 2024 •

edited

Loading