Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NodePortLocal rules being deleted incorrectly due to PodIP recycle #6531

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jul 17, 2024

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP. However, a Pod IP can be recycled and allocated to another Pod when it runs into succeeded or failed stage, which causes more than one Pod to share a Pod IP. When the terminated Pod was deleted, NodePortLocal controller incorrectly deleted the rules that belong to another Pod because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key (namespace + name). The podToIP cache is no longer needed as we can clean up rules by Pod key.

Fixes #6527

@tnqn tnqn added kind/bug Categorizes issue or PR as related to a bug. area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. labels Jul 17, 2024
@tnqn tnqn added this to the Antrea v2.1 release milestone Jul 17, 2024
@tnqn tnqn force-pushed the fix-npl branch 3 times, most recently from 7ed6c74 to ca5de89 Compare July 17, 2024 14:47
jianjuns
jianjuns previously approved these changes Jul 17, 2024
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

luolanzone
luolanzone previously approved these changes Jul 18, 2024
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, one nit.

@@ -16,6 +16,8 @@ package rules

// PodNodePort contains the Node Port, Pod IP, Pod Port and Protocols for NodePortLocal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PodNodePort contains the Node Port, Pod IP, Pod Port and Protocols for NodePortLocal.
// PodNodePort contains the Pod namespaced name, Node Port, Pod IP, Pod Port and Protocol for NodePortLocal.

or remove the comment? I feel it's self-explanation and no need to add comment for PodNodePort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, I didn't remove it as a struct's comment may be useful when generating doc or when IDE references the struct.

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP.
However, a Pod IP can be recycled and allocated to another Pod when it
runs into succeeded or failed stage, which causes more than one Pod to
share a Pod IP. When the terminated Pod was deleted, NodePortLocal
controller incorrectly deleted the rules that belong to another Pod
because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key
(namespace + name). The podToIP cache is no longer needed as we can
clean up rules by Pod key.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
Copy link
Contributor

@XinShuYang XinShuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member Author

tnqn commented Jul 18, 2024

/test-all

@tnqn
Copy link
Member Author

tnqn commented Jul 18, 2024

/test-e2e
/test-windows-e2e
/skip-conformance
/skip-networkpolicy

@tnqn tnqn merged commit 288ce62 into antrea-io:main Jul 18, 2024
56 of 59 checks passed
@tnqn tnqn deleted the fix-npl branch July 18, 2024 14:48
tnqn added a commit to tnqn/antrea that referenced this pull request Jul 18, 2024
antrea-io#6531)

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP.
However, a Pod IP can be recycled and allocated to another Pod when it
runs into succeeded or failed stage, which causes more than one Pod to
share a Pod IP. When the terminated Pod was deleted, NodePortLocal
controller incorrectly deleted the rules that belong to another Pod
because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key
(namespace + name). The podToIP cache is no longer needed as we can
clean up rules by Pod key.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
tnqn added a commit to tnqn/antrea that referenced this pull request Jul 18, 2024
antrea-io#6531)

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP.
However, a Pod IP can be recycled and allocated to another Pod when it
runs into succeeded or failed stage, which causes more than one Pod to
share a Pod IP. When the terminated Pod was deleted, NodePortLocal
controller incorrectly deleted the rules that belong to another Pod
because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key
(namespace + name). The podToIP cache is no longer needed as we can
clean up rules by Pod key.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
tnqn added a commit that referenced this pull request Jul 22, 2024
#6531) (#6534)

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP.
However, a Pod IP can be recycled and allocated to another Pod when it
runs into succeeded or failed stage, which causes more than one Pod to
share a Pod IP. When the terminated Pod was deleted, NodePortLocal
controller incorrectly deleted the rules that belong to another Pod
because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key
(namespace + name). The podToIP cache is no longer needed as we can
clean up rules by Pod key.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
tnqn added a commit that referenced this pull request Jul 22, 2024
#6531) (#6533)

The NodePortLocal cache bound a Pod's NodePortLocal rules to its Pod IP.
However, a Pod IP can be recycled and allocated to another Pod when it
runs into succeeded or failed stage, which causes more than one Pod to
share a Pod IP. When the terminated Pod was deleted, NodePortLocal
controller incorrectly deleted the rules that belong to another Pod
because they have the same IP.

The patch fixes it by binding the NodePortLocal rules to its Pod key
(namespace + name). The podToIP cache is no longer needed as we can
clean up rules by Pod key.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodePortLocal rules for a particular Pod are missing while the NPL annotation is present
4 participants