-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix race introduced in membership #5731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
875c72e
to
8ef889c
Compare
src/Orleans.Runtime/MembershipService/MembershipTableManager.cs
Outdated
Show resolved
Hide resolved
8ef889c
to
4437b17
Compare
4437b17
to
6020aa0
Compare
Ready for review |
@@ -463,7 +479,7 @@ public async Task<AddressAndTag> RegisterAsync(ActivationAddress address, bool s | |||
// on all silos other than first, we insert a retry delay and recheck owner before forwarding | |||
if (hopCount > 0 && forwardAddress != null) | |||
{ | |||
await Task.Delay(RETRY_DELAY); | |||
await this.RefreshMembership(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an arbitrary wait, refresh membership once and wait for that refresh to propagate to the directory
default: | ||
return false; | ||
} | ||
return status != SiloStatus.None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have caused incorrect "forwarding to remote cluster" type messages beforehand. It still can (eg before a silo is Joining
or after it's been cleaned up, but much less likely)
|
||
await Task.Delay(TimeSpan.FromMilliseconds(10)); | ||
} while (this.snapshot.Version < v || this.snapshot.Version < this.membershipTableManager.MembershipTableSnapshot.Version); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refresh membership (at least) once & wait for the version to propagate through
if (status.IsTerminating()) | ||
var timeoutTask = Task.Delay(GossipTimeout); | ||
var task = await Task.WhenAny(gossipTask, timeoutTask); | ||
if (ReferenceEquals(task, timeoutTask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always wait up to 3 seconds for gossip to complete. May slow shutdown/startup, but also should give us generally more predictable behavior
} | ||
else if (this.log.IsEnabled(LogLevel.Debug)) | ||
{ | ||
this.log.LogDebug("Timed out while gossiping status to other silos after {Timeout}", GossipTimeout); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but only log timeouts if we're shutting down or if debug logging is enabled
} | ||
|
||
await pending; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want at most 1 concurrent refresh
…embershipTableManger (dotnet#5731)" This reverts commit 5d9d082.
No description provided.