-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics: Use dedicated thread instead of the thread pool for the defa… #1028
Metrics: Use dedicated thread instead of the thread pool for the defa… #1028
Conversation
…ult aggregation cycle (code only, does not build not NetStandard).
@@ -139,11 +145,13 @@ private void Run() | |||
DateTimeOffset now = DateTimeOffset.Now; | |||
TimeSpan waitPeriod = GetNextCycleTargetTime(now) - now; | |||
|
|||
Task.Delay(waitPeriod).ConfigureAwait(continueOnCapturedContext: false).GetAwaiter().GetResult(); | |||
Thread.Sleep(waitPeriod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want a cancelable wait. One way is to create a CancellationToken, get its WaitHandle and call WaitOne with the TimeSpan. The CancellationToken takes the place of the shouldBeRunning
bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want a cancelable wait at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fast and graceful shutdown. Maybe I misunderstood the point of StopAsync, but I assume it's to be called at shutdown and the expectation is that shutdown is fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Right now we do not do another data dump on a cancelled cycle, but you are right: it would be a good improvement. On the other hand, dealing with WaitHandles here would mean dealing with Disposables. If we go there, I would prefer to do it as a separate change from addressing the issue at hand. Before the proposed change, this wait was not cancelable either.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -139,11 +145,13 @@ private void Run() | |||
DateTimeOffset now = DateTimeOffset.Now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this change, I know, but please use UtcNow
instead of Now
(also in GetNextCycleTargetTime
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. If I recall correctly, there may have been purpose to it - take a local TS so that the serializer can choose to UTC it or not. But thinking about it today - I agree that using UTC would be better - and if the serializer wanted to really do something local, it would do that consistently for all telemetry. Please feel free to file an issue.
Perhaps a Timer (System.Threading.Timer) would work? It doesn't burn a thread. Callbacks happen on threadpool threads. |
Another idea. Go back to the original code (using private async Task RunAsync(CancellationToken cancellationToken)
{
while (true)
{
DateTimeOffset now = DateTimeOffset.Now;
TimeSpan waitPeriod = GetNextCycleTargetTime(now) - now;
await Task.Delay(waitPeriod, cancellationToken).ConfigureAwait(false);
this.FetchAndTrackMetrics();
}
} Start becomes: public bool Start()
{
if (this.workerTask != null)
{
return false; // Was already running or stopped.
}
this.workerTask = Task.Run(() => RunAsync(this.cancellationTokenSource.Token));
return true;
} StopAsync becomes: public Task StopAsync()
{
this.cancellationTokenSource.Cancel();
return this.workerTask;
} You don't need the runningState (by my reading it's impossible to re-start once StopAsync has been called) although you might want to protect Start against races with a sync object. The CancellationTokenSource should be created in the constructor. |
@pharring , Thanks for the feedback. Code comments added with this change describe how and why this is specifically targeted to not use the thread pool and thus the most obvious Task.Delay(..) approach. Do you have concerns with that reasoning? In that context, if we wanted to make this wait cancelable, we would probably create a ManualResetEventSlim object in a try-finally block right before we actually perform the wait, and we would dispose the object in the same finally block after the wait. However, as mentioned, this change aims to address a customer issue in production. As such, I do not want not introduce any additional logic, however useful it might be in a separate change. This wait was not cancellable before and there is no need to make it so in the context of this fix. However the signal logic described above would be a good improvement. Let's do it as a separate change. Thanks! :) |
private int runningState; | ||
private Task workerTask; | ||
private Thread aggregationThread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in making this a private static Thread?
I think there is a scenario where a customer is creating multiple TelemetryConfigurations, which would create multiple of these internal classes each with a private thread. Please consider if there is any value in making this thread globally unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you could end up with several instances of DefaultAggregationPeriodCycle. Not only if you have several TelemetryConfigurations, but also if you have explicitly opted into aggregating at the scope of a telemetry client.
However, a shared thread would require shared cancellation and waiting logic, which is non-trivial. For a bug fix we should always err on the side of a minimal change. Here such change is to get off the thread pool.
But yes, now that we create threads explicitly, we can consider using one shared thread for all instances of DefaultAggregationPeriodCycle in the future. It would lighten the resource load.
I have no concerns if the fix addresses the customer issue. |
// pool had run user code before. Such user code may be doing an asynchronous wait scheduled to | ||
// continue on the same thread(e.g. this can occur when using a custom synchronization context or a | ||
// custom task scheduler). If such case the waiting user code will never continue. | ||
// By creating our own thread, we guarantee no interactions with potentially incorrectly written async user code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest we just say:
// We create a dedicated thread for aggregation. This is to prevent aggregation
// from being affected by potential thread pool starvation.
The rest is about the old implementation that is (IMO) best left unsaid because that implementation contributed to the very thread pool starvation we're trying to avoid! In addition, I think the "had run user code before" and "may be doing an asynchronous wait" parts are misleading and irrelevant to the crux of the problem which was the .GetAwaiter().GetResult()
piece.
@macrogreg I've added missing reference. Please update changelog and we'll be good to go. |
I believe several improvements were discussed for this code. @macrogreg could you please create issues describing these improvements? We'll prioritize them accordingly. I'd like to merge this PR and ship the fix in beta3. |
#1028) * Metrics: Use dedicated thread instead of the thread pool for the default aggregation cycle (code only, does not build not NetStandard). * Add missing reference to System.Threading.Thread on netstandard1.3 * changelog
@lmolkova : The improvements discussed here are good suggestions for the future, but we should approach them separately from this fix. I fee you merged it today, so i am assuming we are on the same page. :) |
@macrogreg, please create a separate issues for these improvements so we don't forget about them |
This is in context of #1026.
Work in progress for now.