Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Agent crash if installation of essential flows failed #1787

Closed
wants to merge 1 commit into from

Conversation

weiqiangt
Copy link
Contributor

Importing the changes wenyingd/ofnet#16 to fix the issue.

Fixed #1745.

Importing the changes ofnet#16 to fix the issue.

Fixed antrea-io#1745
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-io
Copy link

codecov-io commented Jan 27, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@ba703c0). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1787   +/-   ##
=======================================
  Coverage        ?   63.39%           
=======================================
  Files           ?      192           
  Lines           ?    16363           
  Branches        ?        0           
=======================================
  Hits            ?    10373           
  Misses          ?     4932           
  Partials        ?     1058           
Flag Coverage Δ
e2e-tests 43.76% <0.00%> (?)
kind-e2e-tests 50.40% <0.00%> (?)
unit-tests 42.75% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@weiqiangt
Copy link
Contributor Author

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my test, this PR didn't work as expected.

@tnqn
Copy link
Member

tnqn commented Jan 27, 2021

I0127 04:00:26.524038       1 agent.go:65] Starting Antrea agent (version v0.13.0-dev-6f532fa2)
...
I0127 04:00:27.542100       1 entry.go:359] Openflow Connection for new switch: 00:00:86:df:cf:cf:c5:4a
I0127 04:00:27.542471       1 ofctrl_bridge.go:220] OFSwitch is connected: 00:00:86:df:cf:cf:c5:4a
E0127 04:00:27.550213       1 entry.go:314] Received Vendor error: NXBAC_CT_DATAPATH_SUPPORT on ONFT_BUNDLE_ADD_MESSAGE message
E0127 04:00:27.550244       1 entry.go:314] Received Vendor error: NXBAC_CT_DATAPATH_SUPPORT on ONFT_BUNDLE_ADD_MESSAGE message
I0127 04:00:27.554713       1 agent.go:242] Agent initialized NodeConfig=NodeName: k8s-02, OVSBridge: br-int, PodIPv4CIDR: 172.50.1.0/24, PodIPv6CIDR: <nil>, NodeIP: 192.168.1
0.17/24, Gateway: Name antrea-gw0: IPv4 172.50.1.1, IPv6 <nil>, MAC 22:6b:92:61:34:c8, NetworkConfig=&{Encap geneve false }
I0127 04:00:27.554932       1 metrics.go:124] Registering Antrea Proxy prometheus metrics
I0127 04:00:27.555014       1 proxier.go:431] Creating proxier with IPv6 enabled=false

@weiqiangt
Copy link
Contributor Author

@tnqn, I tested in Linux dev 4.4.36-nn3-server #nn3 SMP PREEMPT Thu Apr 13 08:23:18 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux and got the error:

I0127 04:26:32.867536       1 log_file.go:99] Set log file max size to 104857600
I0127 04:26:32.868136       1 agent.go:65] Starting Antrea agent (version v0.13.0-dev-6f532fa2)
I0127 04:26:32.868156       1 client.go:34] No kubeconfig file was specified. Falling back to in-cluster config
I0127 04:26:32.869753       1 prometheus.go:151] Initializing prometheus metrics
I0127 04:26:32.869837       1 ovs_client.go:67] Connecting to OVSDB at address /var/run/openvswitch/db.sock
I0127 04:26:32.870213       1 agent.go:205] Setting up node network
I0127 04:26:32.916956       1 agent.go:646] Setting Node MTU=1450
I0127 04:26:32.917241       1 ovs_client.go:110] Bridge exists: 42811226-0648-4ff2-b4a5-a7420101db13
I0127 04:26:32.920743       1 agent.go:789] Using round number 22
I0127 04:26:32.920803       1 ofctrl.go:170] Initialize connection or re-connect to /var/run/openvswitch/br-int.mgmt.
I0127 04:26:32.940392       1 route_linux.go:124] Initialized iptables
I0127 04:26:33.921030       1 ofctrl.go:185] Connected to socket /var/run/openvswitch/br-int.mgmt
I0127 04:26:33.921139       1 ofctrl.go:247] New connection..
I0127 04:26:33.921154       1 ofctrl.go:254] Send hello with OF version: 4
I0127 04:26:33.921167       1 ofctrl.go:268] Received Openflow 1.3 Hello message
I0127 04:26:33.921432       1 ofctrl.go:285] Received ofp1.3 Switch feature response: {Header:{Version:4 Type:6 Length:32 Xid:3} DPID:00:00:26:12:81:42:f2:4f Buffers:0 NumTables:254 AuxilaryId:0 pad:[0 0] Capabilities:79 Actions:0 Ports:[]}
I0127 04:26:33.921459       1 ofSwitch.go:85] Openflow Connection for new switch: 00:00:26:12:81:42:f2:4f
I0127 04:26:33.921641       1 ofctrl_bridge.go:220] OFSwitch is connected: 00:00:26:12:81:42:f2:4f
E0127 04:26:33.927315       1 ofSwitch.go:357] Received Vendor error: NXBAC_CT_DATAPATH_SUPPORT on ONFT_BUNDLE_ADD_MESSAGE message
E0127 04:26:33.927344       1 ofSwitch.go:357] Received Vendor error: NXBAC_CT_DATAPATH_SUPPORT on ONFT_BUNDLE_ADD_MESSAGE message
E0127 04:26:33.927428       1 agent.go:287] Failed to initialize openflow client: failed to install connection track flows: failed to add all Openflow entries in one transaction, cancelling it
F0127 04:26:33.927698       1 main.go:58] Error running agent: error initializing agent: failed to install connection track flows: failed to add all Openflow entries in one transaction, cancelling it
goroutine 1 [running]:
k8s.io/klog.stacks(0xc000253900, 0xc00075a000, 0xc8, 0x1d8)
	/root/go/pkg/mod/k8s.io/klog@v1.0.0/klog.go:875 +0xb8
k8s.io/klog.(*loggingT).output(0x325eb00, 0xc000000003, 0xc0003205b0, 0x3194b5f, 0x7, 0x3a, 0x0)
	/root/go/pkg/mod/k8s.io/klog@v1.0.0/klog.go:826 +0x330
k8s.io/klog.(*loggingT).printf(0x325eb00, 0x3, 0x1f27be5, 0x17, 0xc0006f3d08, 0x1, 0x1)
	/root/go/pkg/mod/k8s.io/klog@v1.0.0/klog.go:707 +0x14b
k8s.io/klog.Fatalf(...)
	/root/go/pkg/mod/k8s.io/klog@v1.0.0/klog.go:1276
main.newAgentCommand.func1(0xc000295680, 0xc000481400, 0x0, 0x8)
	/root/sources/antrea/cmd/antrea-agent/main.go:58 +0x211
github.com/spf13/cobra.(*Command).execute(0xc000295680, 0xc000116010, 0x8, 0x8, 0xc000295680, 0xc000116010)
	/root/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0xc000295680, 0x31d5fa0, 0xc000000180, 0xc000163f50)
	/root/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
	/root/sources/antrea/cmd/antrea-agent/main.go:37 +0x56

Does your testbed have a different from it?

@github-actions
Copy link
Contributor

This PR is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@ba703c0). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1787   +/-   ##
=======================================
  Coverage        ?   63.27%           
=======================================
  Files           ?      192           
  Lines           ?    18805           
  Branches        ?        0           
=======================================
  Hits            ?    11899           
  Misses          ?     5795           
  Partials        ?     1111           
Flag Coverage Δ
e2e-tests 47.56% <0.00%> (?)
kind-e2e-tests 50.40% <0.00%> (?)
unit-tests 42.75% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2021
@jianjuns jianjuns requested a review from wenyingd August 9, 2021 20:15
@jianjuns
Copy link
Contributor

jianjuns commented Aug 9, 2021

@tnqn, @wenyingd : do we still need this change?

@tnqn
Copy link
Member

tnqn commented Aug 10, 2021

@jianjuns No, this change doesn't resolve the original issue. There is still a chance that some DP feature is missing but the installation of essential flows doesn't fail. We figured out the reason but there is no simply way to detect whether a mod-flow message in a bundle is accepted by DP.
I have created #2571 to fix the original issue, using another way. I will close this one.

@tnqn tnqn closed this Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent should go crash if installation of essential flows failed
7 participants