Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle iptables-restore failures correctly in NPL Controller #2555

Conversation

antoninbas
Copy link
Contributor

Add a retry mechanism in the Controller initialization, which will keep
trying to sync iptables rules until the operation is successful. On
success, the NPL Controller is notified through a channel and can start
its event handlers.

Fixes #2554

Signed-off-by: Antonin Bas abas@vmware.com

@codecov-commenter
Copy link

codecov-commenter commented Aug 6, 2021

Codecov Report

Merging #2555 (7515136) into main (fe4457f) will increase coverage by 4.92%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2555      +/-   ##
==========================================
+ Coverage   60.09%   65.01%   +4.92%     
==========================================
  Files         281      281              
  Lines       22243    25524    +3281     
==========================================
+ Hits        13366    16594    +3228     
+ Misses       7467     7391      -76     
- Partials     1410     1539     +129     
Flag Coverage Δ
e2e-tests 55.88% <73.33%> (?)
kind-e2e-tests 47.09% <68.42%> (+0.18%) ⬆️
unit-tests 42.48% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/nodeportlocal/k8s/npl_controller.go 64.09% <65.21%> (+5.47%) ⬆️
pkg/agent/nodeportlocal/portcache/port_table.go 66.95% <81.81%> (+5.22%) ⬆️
pkg/controller/egress/ipallocator/allocator.go 67.82% <0.00%> (-15.16%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/apis/controlplane/v1beta1/conversion.go 72.44% <0.00%> (-11.89%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/controller/egress/controller.go 76.76% <0.00%> (-10.44%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/legacyapis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 33.84% <0.00%> (-10.16%) ⬇️
... and 267 more

monotosh-avi
monotosh-avi previously approved these changes Aug 9, 2021
@antoninbas
Copy link
Contributor Author

/test-all

pkg/agent/nodeportlocal/portcache/port_table.go Outdated Show resolved Hide resolved
@@ -117,6 +119,14 @@ func podKeyFunc(pod *corev1.Pod) string {
return pod.Namespace + "/" + pod.Name
}

func (c *NPLController) Initialize() error {
klog.InfoS("Will fetch Pods and generate NodePortLocal rules for these Pods")
if err := c.GetPodsAndGenRules(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After it's extracted to a method and called separately, it doesn't wait for cache to sync, I think it may miss Pod data?

Add a retry mechanism in the Controller initialization, which will keep
trying to sync iptables rules until the operation is successful. On
success, the NPL Controller is notified through a channel and can start
its event handlers.

Fixes antrea-io#2554

Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
@antoninbas
Copy link
Contributor Author

@tnqn thanks for the review. I believe you are right about the cache. I pushed a second commit to address it. Please let me know if you have a more elegant solution to suggest.

@antoninbas
Copy link
Contributor Author

/test-e2e

tnqn
tnqn previously approved these changes Aug 11, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/agent/nodeportlocal/portcache/port_table.go Outdated Show resolved Hide resolved
Signed-off-by: Antonin Bas <abas@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas antoninbas merged commit 3248515 into antrea-io:main Aug 11, 2021
@antoninbas antoninbas deleted the handle-iptables-restore-failures-in-NPL-controller branch August 11, 2021 16:44
antoninbas added a commit that referenced this pull request Aug 11, 2021
…ectly in NPL (#2575)

Add a retry mechanism in the Controller initialization, which will keep
trying to sync iptables rules until the operation is successful. On
success, the NPL Controller is notified through a channel and can start
its event handlers.

Fixes #2554

Signed-off-by: Antonin Bas <abas@vmware.com>
annakhm pushed a commit to annakhm/antrea that referenced this pull request Aug 16, 2021
…io#2555)

Add a retry mechanism in the Controller initialization, which will keep
trying to sync iptables rules until the operation is successful. On
success, the NPL Controller is notified through a channel and can start
its event handlers.

Fixes antrea-io#2554

Signed-off-by: Antonin Bas <abas@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NPL Controller shuts down if iptables-restore operation fails
4 participants