-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix that Service routes may get lost when starting on Windows #4470
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4470 +/- ##
==========================================
+ Coverage 65.89% 66.84% +0.95%
==========================================
Files 402 378 -24
Lines 57247 54978 -2269
==========================================
- Hits 37723 36751 -972
+ Misses 16822 15519 -1303
- Partials 2702 2708 +6
|
for _, ingressIP := range svc.Status.LoadBalancer.Ingress { | ||
ingressIPNet := util.NewIPNet(net.ParseIP(ingressIP.IP)).String() | ||
desiredClusterIPSvcIPs.Insert(ingressIPNet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would cause potential issue by appending the ingress IPs to a set for cluster IPs. For example, it could cause the calculated ClusterIP CIDR much bigger than actual value.
If the needed set is supposed to contain both IPs, please rename the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for Windows, Linux ClusterIP CIDR is not calculated from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By potential issue, I meant it could cause a bug in future if others think the set contains pure cluster IPs, which is mislead by the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got that. Then how about change desiredClusterIPSvcIPs to desiredSvcIPNets?
186d6f5
to
19a1254
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @hongliangl ! It looks to me that addServiceRoute() still has the issue. The key at [1] is different than the key at [2]. antrea/pkg/agent/route/route_windows.go Line 263 in 19a1254
[2]: antrea/pkg/agent/route/route_windows.go Line 291 in 19a1254
|
Also, it looked to me that Reconcile() and Add addServiceRoute() can be called at the same time. So, addServiceRoute() can add a route and at the same time, Reconcile() can delete that added route. So there is a race condition. |
Good catch! I'll correct it. |
Yes, they can be called at the same time, but all routes can be installed correctly finally. For example: For a Service, if Reconcile is called first, corresponding route will not be delete, and a mapping SvcIPNet -> route will be stored to sync.Map hostRoutes; then addServiceRoute is called, and the mapping SvcIPNet -> route is stored in hostRoutes, then do nothing. For a Service, if addServiceRoute is called first and no mapping ipNetString -> route is found, then install the route and update SvcIPNet -> route to hostRoutes; then Reconcile is called, the route may be included or not in |
19a1254
to
cf9e7cc
Compare
@hongliangl Consider the following scenario
|
You are right for this case, but if no new Service is created (without step 2), current code is ok. To cover the case you mentioned above, I'll change the code which is like in route_linux.go.
|
cf9e7cc
to
38309b2
Compare
Thanks. So with the new code, stale routes remain as-is, correct? |
Yes, in this way, at least there will be no ClusterIP that is unavailable. |
2be6ee2
to
e8582e2
Compare
e8582e2
to
e1093e0
Compare
svcIPNet := util.NewIPNet(svcIP) | ||
obj, found := c.hostRoutes.Load(svcIPNet.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L309 also needs update, can we add unit tests to cover them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L309 also needs update
Done
can we add unit tests to cover them?
Add some test code to verify the cached route in c.hostRoutes
4e35382
to
2cac702
Compare
Fix antrea-io#4467 Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
2cac702
to
93dbdde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
…-io#4470) Fix antrea-io#4467 Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Fix #4467
Signed-off-by: Hongliang Liu lhongliang@vmware.com