-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Multicast] Clone remoteMembers when creating an updated GroupMemberStatus #4903
[Multicast] Clone remoteMembers when creating an updated GroupMemberStatus #4903
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4903 +/- ##
==========================================
- Coverage 72.13% 71.36% -0.78%
==========================================
Files 409 409
Lines 61161 59956 -1205
==========================================
- Hits 44121 42788 -1333
- Misses 14127 14286 +159
+ Partials 2913 2882 -31
|
@@ -82,12 +82,14 @@ type GroupMemberStatus struct { | |||
group net.IP | |||
// localMembers is a map for the local Pod member and its last update time, key is the Pod's interface name, | |||
// and value is its last update time. | |||
localMembers map[string]time.Time | |||
localMembers map[string]time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is necessary to introduce Mutexes for localMembers and remoteMembers. The root cause of the issue is we didn't create a new object for remoteMembers before updating the cache. The change in line 137 is enough.
63d63d1
to
5354f38
Compare
@@ -131,7 +131,7 @@ func (c *Controller) updateGroupMemberStatus(obj interface{}, e *mcastGroupEvent | |||
newStatus := &GroupMemberStatus{ | |||
group: status.group, | |||
localMembers: make(map[string]time.Time), | |||
remoteMembers: status.remoteMembers, | |||
remoteMembers: sets.NewString(status.remoteMembers.List()...), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd could you provide a few background for this issue in the PR summary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create an issue for it.
/test-multicast-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -131,7 +131,7 @@ func (c *Controller) updateGroupMemberStatus(obj interface{}, e *mcastGroupEvent | |||
newStatus := &GroupMemberStatus{ | |||
group: status.group, | |||
localMembers: make(map[string]time.Time), | |||
remoteMembers: status.remoteMembers, | |||
remoteMembers: sets.NewString(status.remoteMembers.List()...), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteMembers: sets.NewString(status.remoteMembers.List()...), | |
remoteMembers: status.remoteMembers.Union(), |
List
performs unnecessary sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
wg.Add(4) | ||
// Below func adds local group join events. | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below code is too redundant, can we reduce redundancy?
And there seems no point to generate the events concurrently as they are eventually consumed by a single event handler. The race is not between the event generator and others, but the eventHandler and worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
for i := 0; i < numEvents; i++ { | ||
mockIfaceStore.EXPECT().GetInterfaceByName(fmt.Sprintf("local-interfaceName-%d", i)).Return(&interfacestore.InterfaceConfig{ | ||
Type: interfacestore.ContainerInterface, | ||
InterfaceName: fmt.Sprintf("local-interfaceName-%d", i), | ||
OVSPortConfig: &interfacestore.OVSPortConfig{ | ||
OFPort: int32(i), | ||
}, | ||
}, true).AnyTimes() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's contradictory to assert a function is called "anytimes" many times.
If you want accurate assertion, it shouldn't use Anytimes.
If you want it to be called anytimes, no need to have a for loop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a valid interfaceStore to avoid such mock.
for i := 0; i < 2; i++ { | ||
go wait.Until(c.worker, time.Second, stopCh) | ||
} | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no guaratee that when this call finishes, eventHander and worker have processed the events. Could we use some determinstic input and decide when the goroutines finish their job by polling the result, which could be more determinstic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check on the size of eventCh and queue after all events have been sent.
5354f38
to
044084a
Compare
@@ -131,7 +131,7 @@ func (c *Controller) updateGroupMemberStatus(obj interface{}, e *mcastGroupEvent | |||
newStatus := &GroupMemberStatus{ | |||
group: status.group, | |||
localMembers: make(map[string]time.Time), | |||
remoteMembers: status.remoteMembers, | |||
remoteMembers: sets.NewString().Union(status.remoteMembers), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This writing initializes an extra set. Why don't just use status.remoteMembers.Union(nil)
which is quite common in our code base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
wg.Add(4) | ||
|
||
eventFunc := func(eType eventType, isLocal bool) { | ||
LeastSignificantByteArr := rand.Perm(numEvents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeastSignificantByteArr := rand.Perm(numEvents) | |
leastSignificantByteArr := rand.Perm(numEvents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
stopCh := make(chan struct{}) | ||
defer close(stopCh) | ||
groupIP := net.ParseIP("224.3.4.5") | ||
numEvents := 250 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this number, the test's logging could flood the output and even make the race stack being cut, making it hard to figure out what causes the failure. You can try removing the fix and runing the test alone. I tested a few times setting it to 10 could always trigger the race and only produced reasonable logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
044084a
to
60559d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
/test-multicast-e2e |
@tnqn can you change the commit title when you merge it? the commit title doesn't match the change. Thanks. |
60559d0
to
2ddfcb8
Compare
When updating GroupMemberStatus, a clone of remoteMembers should assign to the updated GroupMemberStatus. Simply setting it with original remoteMembers will cause race conditions. Fixes antrea-io#4904 Signed-off-by: ceclinux <src655@gmail.com>
2ddfcb8
to
039f0c8
Compare
I have updated the title and commit message. |
/skip-all |
…trea-io#4903) When updating GroupMemberStatus, a clone of remoteMembers should assign to the updated GroupMemberStatus. Simply setting it with original remoteMembers will cause race conditions. Fixes antrea-io#4904 Signed-off-by: ceclinux <src655@gmail.com>
…trea-io#4903) When updating GroupMemberStatus, a clone of remoteMembers should assign to the updated GroupMemberStatus. Simply setting it with original remoteMembers will cause race conditions. Fixes antrea-io#4904 Signed-off-by: ceclinux <src655@gmail.com>
…trea-io#4903) When updating GroupMemberStatus, a clone of remoteMembers should assign to the updated GroupMemberStatus. Simply setting it with original remoteMembers will cause race conditions. Fixes antrea-io#4904 Signed-off-by: ceclinux <src655@gmail.com>
This patch is to resolve a concurrency issue in Multicast feature with encap mode.
The root cause is the existing code doesn't create a new object for field "remoteMembers" before updating it into cache.
Fixed: #4904