Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug we initiate a backup when the volume has started detaching #2635

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

PhanLe1010
Copy link
Contributor

@PhanLe1010 PhanLe1010 commented Feb 23, 2024

When the volume.Spec.NodeID is different than the node ID of the backup VA ticket, we should not initiate a backup as the volume is going to detach soon

longhorn/longhorn#7937

This PR is going to replace the approach at the PR #2627. After discussed with @ejweber and @james-munson , we think that this approach is better because:

  1. The other approach only touches the logic for the VA ticket of RWO non-migratible volume. It still leaves the logic for the RWX and mitigrable volume the same as we cannot compare the volume.Spec.NodeID with the node ID of these VA tickets. They are most likely to be different
  2. It is fundamentally still correct that when the volume start detaching (but not finish detaching yet) the VA ticket can set to satisfied because at this exact moment the volume is still attached. VA controller can set reset the VA ticket to satisfied:false when the volume finish detaching. It is the responsibility of the client (backup controller in this case) to check and make sure that the volume.Spec.NodeID is still the desired one

When the volume.Spec.NodeID is different than the node ID of the backup
VA ticket, we should not initiate a backup as the volume is going to
detach soon

longhorn-7937

Signed-off-by: Phan Le <phan.le@suse.com>
@PhanLe1010
Copy link
Contributor Author

@innobead innobead self-requested a review February 23, 2024 03:12
@innobead innobead merged commit 586331b into longhorn:master Feb 23, 2024
5 checks passed
@innobead
Copy link
Member

Still need to follow up with the test results later, even though the PR has been merged.

@innobead
Copy link
Member

@mergify backport v1.6.x v1.5.x

Copy link

mergify bot commented Feb 23, 2024

backport v1.6.x v1.5.x

✅ Backports have been created

@innobead
Copy link
Member

The fix is general, so it should be expected to happen in 1.6 and 1.5 after VA was introduced. Do you know why this issue can reproduce after 1.5.4-RC2? @PhanLe1010 @c3y1huang

@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Feb 23, 2024

Hi @innobead my current theory is that we change the code flow somehow and make it longer between volume.Spec.NodeID being cleanup and the time that volume starts detaching by setting the engine.spec.desirestate to stop. But that is just an idea and I couldn't find proof from the git diff of recent commits. I believe @c3y1huang and @mantissahz already searched the commit history too. We couldn't pin down the exact commit that made the race become more visible

@innobead
Copy link
Member

That's reasonable and should be fine, as we already clarified the fix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants