-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add new workspace event that gives handlers the opportunity to be processed immediately #76932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e processed immediately This is in response to allocations seen when the OOP process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process. There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization. This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast. Going to mark this as draft and get speedometer numbers off this to see if this helps enough, or if instead the other points of asynchronicity should be investigated (specifically, the usage of the _textChangeQueue ABWQ). There are other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents, but previous investigations into that ended up complicated and incomplete.
|
PR validation insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/605941 |
src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Log/FunctionId.cs
Outdated
Show resolved
Hide resolved
|
Moving out of draft mode as the numbers look pretty good. Still some serialization happening during typing, but it's a lot less than before and this change was pretty trivial. If code reviewers think adding this to the workspace level is not desirable, can look into other approaches. |
…r reduce the amount of serialization and deserialization
|
commit 5 PR validation insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/606025 |
…ositions to allow composition to include an IThreadingContext
…ecksumUpdater.DispatchSynchronizeTextChanges and RemoteAssetSynchronizationService.SynchronizeTextChangesAsync
...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs
Outdated
Show resolved
Hide resolved
| using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder); | ||
|
|
||
| foreach (var (oldDocument, newDocument) in values) | ||
| _ = _threadingContext.JoinableTaskFactory.RunAsync(async () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should def doc the design here. make it clear why we've written it this way.
note: you can also use IAsyncToken here to track the work, allowing you to properly be able to unit test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.
This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to fire-and-forget, but only yield the current thread if an async operation necessitates it.
Is GetChangeRanges expensive? It doesn't look to be to me, but maybe I'm missing an override? If it is expensive, the two conditions that use it could probably be removed, as they are very minor and specialized optimizations.
Will add some doc here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought GetChangeRanges could potentially end up in a text diff, but maybe not.
| using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None)) | ||
| { | ||
| args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId); | ||
| ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside, are these exception safe if the attached handler throws an exception? i ask because the old system might have been ok ift he Task went into a faulted state. i'm not sure about the new system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops asked this same question here: #76932 (comment) I think this might cause an issue now if this were to throw so probably best to handle that better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concern with me just changing RaiseEvent from using FatalError.ReportAndPropagate to instead use FatalError.ReportAndCatch (also, would it make sense to have the try/catch inside the foreach in that method?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonmalinowski -- any thoughts on change RaiseEvent to instead use FatalError.ReportAndCatch (and to move the try/catch inside the loop)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and just made the change to switch to ReportAndCatch
| VSCode_Project_Load_Started = 861, | ||
| VSCode_Projects_Load_Completed = 862, | ||
|
|
||
| // 900-999 for items that don't fit into other categories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // 900-999 for items that don't fit into other categories. | |
| // 900-999 things related to workspace and OOP solution sync |
We're not restricted to three digits, so no reason to make this the final catch-all group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted a catch-all group as I was tired scanning through trying to find the next available slot with all the other groups. Is the request here to change the starting range / size of the catch all group, or just not to have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just don't have it. And if we think groups don't make sense in the first place, just get rid of all the groups. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i really like groups. just give things around 10-20 enries. it's def nicer when you're adding a bunch of new telemetry markers to a speciifc area.
| using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder); | ||
|
|
||
| foreach (var (oldDocument, newDocument) in values) | ||
| _ = _threadingContext.JoinableTaskFactory.RunAsync(async () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.
This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().
| using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None)) | ||
| { | ||
| args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId); | ||
| ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears RaiseEvent will report any exception being thrown but will still propagate it out. That might be dangerous here, because it now means any subscriber's mistake will now completely break the workspace and put the user in a really broken state. I admit I'm not sure why we aren't catching exceptions in RaiseEvent, so maybe that should be changed (or we have a variant that does catch).
use some local functions for cleanup Update some comments
CyrusNajmabadi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signing mostly off. I do want Jason to approve as well. And there look to be a few places we can clean up. I would like exception resilience thought about. Thanks!
| Workspace workspace, | ||
| IAsynchronousOperationListenerProvider listenerProvider, | ||
| IThreadingContext threadingContext, | ||
| CancellationToken shutdownToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if the shutdown token is needed. Instead of just using the disposal token on the threading context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are different. VisualStudioWorkspaceServiceHubConnector has StartListening/StopListening methods, which drive the shutdownToken. However, I think the threadingContext's DisposalToken is driven by MEF and I don't think is disposed until VS shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw your other comment about VisualStudioWorkspaceServiceHubConnector. Let me think about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like right now, VisualStudioWorkspaceServiceHubConnector.StopListening is called on Workspace dispose. If I just send over the threadingContext and use it's cancellation token, are we ok with essentially just using a cancellation token from MEF dispose and not workspace dispose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. let's keep this as is. i've looked at hte code and i'm not sure how to reconcile the different Disposes/CTs. So let's not change that now.
|
|
||
| // only push solution snapshot from primary (VS) workspace: | ||
| _checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _disposalCancellationSource.Token); | ||
| _checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _threadingContext, _disposalCancellationSource.Token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the threading context, we might be able to get rid of the disposal logic here and passing that ct along
...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs
Outdated
Show resolved
Hide resolved
Update comments Use string constants instead of allocating
|
@jasonmalinowski -- would love to get your feedback on the changes and any remaining concerns you may have that haven't been addressed as desired |
| TelemetryLogging.LogAggregatedCounter(FunctionId.ChecksumUpdater_SynchronizeTextChangesStatus, KeyValueLogMessage.Create(m => | ||
| { | ||
| m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName; | ||
| m[TelemetryLogging.KeyValue] = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the idea that this will add 1 each time this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the LogAggregatedCounter will add the specified amount to the counter
| { | ||
| m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName; | ||
| m[TelemetryLogging.KeyValue] = 1L; | ||
| m[TelemetryLogging.KeyMetricName] = metricName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this here, if it is encoded in KeyName above? is one of these redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up down in AbstractAggregatingLog.Log. That code requires KeyName/KeyValue to be specified, and if KeyMetricName isn't specified, falls back to KeyName to be the metricName.
I can look into whether there are callers that depend on that behavior, and if not, remove that parameter. However, I'd prefer not to as even if that could be removed, it would add an allocation to strip out the metricname from the keyname.
| { | ||
| var client = await RemoteHostClient.TryGetClientAsync(_workspace, _shutdownToken).ConfigureAwait(false); | ||
| if (client == null) | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly odd that this would show up as 'failure'. this means a customer who has OOP off will show has massively failing this op. i think this should maybe count as 'success' or perhaps NA so we don't get the wrong idea about being in a bad statee.
| { | ||
| m[TelemetryLogging.KeyName] = keyName; | ||
| m[TelemetryLogging.KeyValue] = 1L; | ||
| m[TelemetryLogging.KeyMetricName] = metricName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment if we need the keyname and metric name esp if hte keyname contains the metric name.
|
with your latest changes, i feel fine with this going in (even without Jason signing off). he will eventuallyget around to looking at this. |
…every invocation and to not log telemetry in the case the RemoteHostClient.TryGetClientAsync call doesn't return a client.
|
@sharwell has an alternative approach which he's going to try out. I'll get perf numbers on his approach once it's ready and then we can evaluate how to proceed. In reply to: 2628571096 |
|
Latest test insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/607891 (based on commit 14) |
…mplify the change
|
Alternative proposed implementation by Sam here: #77059 Running speedometer tests against that and will update once those are available |
| WorkspaceChangeEventArgs args = null; | ||
| if (ev.HasHandlers) | ||
| { | ||
| args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just pull this assignment out of the if() block here; practically any workspace where we would care about allocations will have at least one handler. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| // we won't notify all of the callers. The expectation is such an exception would be thrown to the SafeStartNew in the workspace's event queue that | ||
| // will raise that as a fatal exception, but OperationCancelledExceptions might be able to propagate through and fault the task we are using in the | ||
| // chain. I'm choosing to use ReportWithoutCrashAndPropagate, because if our theory here is correct, it seems the first exception isn't actually | ||
| // causing crashes, and so if it turns out this is a very common situation I don't want to make a often-benign situation fatal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is stale now. This looks like it was written by me, but since the comment has a link to a query (rather than the actual bug), it's quite useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Killed the comment, things are different enough now that it's really not applicable.
| using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder); | ||
|
|
||
| foreach (var (oldDocument, newDocument) in values) | ||
| _ = _threadingContext.JoinableTaskFactory.RunAsync(async () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought GetChangeRanges could potentially end up in a text diff, but maybe not.
| var oldDocument = e.OldSolution.GetDocument(e.DocumentId); | ||
| var newDocument = e.NewSolution.GetDocument(e.DocumentId); | ||
|
|
||
| Debug.Assert(oldDocument != null && newDocument != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var oldDocument = e.OldSolution.GetDocument(e.DocumentId); | |
| var newDocument = e.NewSolution.GetDocument(e.DocumentId); | |
| Debug.Assert(oldDocument != null && newDocument != null); | |
| var oldDocument = e.OldSolution.GetRequiredDocument(e.DocumentId); | |
| var newDocument = e.NewSolution.GetRequiredDocument(e.DocumentId); |
If we're wrong, telemetry will tell us since you're adding that catch-and-report. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| var state = await oldDocument.State.GetStateChecksumsAsync(_shutdownToken).ConfigureAwait(false); | ||
| var newState = await newDocument.State.GetStateChecksumsAsync(_shutdownToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused a bit by the semantics we're trying to have here for OnWorkspaceChangedImmediate. The statement was "it must be very fast", but now we're computing checksums? I imagine the async-ness here is we use an AsyncLazy<> under the covers, and so if somebody else is asking we won't block our thread. But this newDocument is very new (indeed, we're still raising events for it!) so would this be the first place it'll get called? And would that then be consuming CPU right here when we wanted this to be cheap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very valid concern, debugged into the hash computation for the newDocument, and this is forcing it on the calling (UI) thread. Need to think on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the task.Yield as Cyrus suggested, and verified that all the calculations have moved to a bg thread, but still saw pretty good allocation numbers as outlined in the PR updated description.
| // about the text changes between oldDocument and newDocument. By doing this, we can | ||
| // reduce the likelihood of the remote side encountering an unknown checksum and | ||
| // requiring a synchronization of the full document contents. | ||
| // This method uses JTF.RunAsync to create a fire-and-forget task. JTF.RunAsync will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment now stale? RunAsync not being used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the stale bit.
|
could we just do the simple thing of moving off the UI thread (use a TPL.Run or Yield), and just seeing if we still get the good results of not needing to sync anything? |
|
Yeah, maybe a Task.Yield would be good enough. Can try that out and see what the numbers say. In reply to: 2638638591 |
…ount of work on the calling (UI) thread
|
And here's the commit 17 test insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/608770 Numbers looked pretty good, some serialization still happens, but it's a lot less. Will update the PR description |
|
@jasonmalinowski, @sharwell -- Will go ahead and merge on Monday unless either of you feels otherwise |
This is in response to allocations seen when the CA process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process.
There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization.
This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast.
The numbers from speedometer came back looking pretty good (below). Another point of asynchronicity could be investigated around the usage of the _textChangeQueue ABWQ, but it might not be worth it as it looks like this change removes the majority of the serialization during typing. Other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents are also possible, but previous investigations into that ended up complicated and incomplete.
*** before allocations during typing in VS process ***

*** commit 3 allocations during typing in VS process ***

*** commit 17 allocations during typing in VS process ***

*** before allocations during typing in CA process ***

*** commit 3 allocations during typing in CA process ***

*** commit 17 allocations during typing in CA process ***
