Do not allow ReadValue to return while a stream is still being read #20171

sharwell · 2017-06-12T15:38:43Z

Fixes #19761

Ask Mode

Customer scenario

The specific steps to encounter this issue are not completely clear. It appears to involve working in a file which has a deeply nested structure. When the bug arises, Visual Studio may read from a pointer which has been freed, resulting in corruption of internal data structures or an access violation.

Bugs this fixes:

Fixes #19761
Fixes #20170

Workarounds, if any

None.

Risk

The change made here is theoretically capable of producing either a deadlock or a performance regression. Neither of these seems likely from the particular change, and the situations we are avoiding are certainly worse (stack overflow or heap corruption). @CyrusNajmabadi Can you provide an evaluation of the risk for this change?

Performance impact

This change disables a cancellation point in the code, which could in some cases cause a regression related to serialization performance. However, this cancellation point was implemented as a recursion guard, not a proper cancellation point, so it's likely we were not relying on its behavior as the latter for performance.

Is this a regression from a previous update?

Yes, this bug was introduced in #16798 (specifically 6427a2b) for release 15.1.

Root cause analysis:

The code accessing unmanaged memory is extremely performance sensitive, so most checks capable of detecting this concurrency bug early have too much overhead for deployment.

How was the bug found?

Internal customer report.

Pilchie · 2017-06-12T15:44:31Z

src/Compilers/Core/Portable/Serialization/ObjectReader.cs

Can we explicitly mention that waiting with cancellation would be a problem if the underlying task doesn't support the cancellation?

➡️ Updated the comment

Fixes dotnet#19761

Pilchie · 2017-06-12T16:03:05Z

Thanks for the comment.

CyrusNajmabadi · 2017-06-12T18:48:10Z

src/Compilers/Core/Portable/Serialization/ObjectReader.cs

+                // CANCELLATION: If cancellation is required, DO NOT attempt to cancel the operation by cancelling this
+                // wait. Cancellation must only be implemented by modifying 'task' to cancel itself in a timely manner
+                // so the wait can complete.
+                value = task.GetAwaiter().GetResult();


This looks ok to me. Though i would like @stephentoub to weigh in.

@stephentoub is this the appropriately idiomatic way to say "i would like to wait on this task, and cancel early if it hasn't started running, but block on it until it's IsCompleted becomes true"?

This looks right to me, in that it's effectively the same as the else clause below (well, slightly closer would be if the _cancellationToken passed to StartNew were instead changed to be CancellationToken.None, but what you have looks fine). As it's now written, either the operation will be canceled before it's ever started (due to the token having cancellation requested prior to the StartNew delegate getting invoked, although because it's LongRunning, a new thread is going to be spawned and ReadValueWorker will likely be invoked very quickly as it won't be waiting in any thread pool queues), or any cancellation will be up to the ReadValueWorker implementation itself, which is what you want. The previous version would have allowed code to continue running even as ReadValueWorker is still executing (since the wait was being canceled rather than ReadValueWorker itself), and thus if what comes after this might attempt to access the same state as ReadValueWorker potentially touches, you end up with race conditions. Even if ReadValueWorker itself pays attention to / polls for cancellation, there are still potentially races, depending on the implementation.

There is one difference functionally, though: if ReadValueWorker throws an exception, Wait will wrap it in an AggregateException, whereas GetAwaiter().GetResult() will propagate it directly. If you wanted this change to simply remove the cancellation of the wait, you could just remove the whole task.Wait(_cancellationToken) line, as task.Result itself is will block until the task completes and will similarly wrap the exception.

Thanks stephen. Sam's code looks to be exaclty what we want then (i.e. throw exception directly, not an aggregate exception). Preemptively cancel, but only have it respected on the next potential thread launch. Thanks!

CyrusNajmabadi

I'm good if Stephen is good.

CyrusNajmabadi · 2017-06-12T19:02:07Z

Hrmm... the failure of all tests is concerning though. What's up with that?

CyrusNajmabadi · 2017-06-12T19:28:36Z

Also, we have this thread spawning in two other places (in ObjectWriter). We probably want to have this fix applied there as well, right?

…ill being written

sharwell · 2017-06-13T13:53:38Z

@MattGertz this issue fixes a possible access violation (memory corruption) when editing files with deeply nested structures (but not so deep such that they don't exist; both @Pilchie and I have hit this in practice).

MattGertz · 2017-06-13T14:45:36Z

@sharwell Is there any telemetry we can add to help pin down the repro steps, which will help us evaluate if there might be other effects lurking behind this one?

Pilchie · 2017-06-13T20:53:49Z

Just hit this again in my service hub process :-/

sharwell · 2017-06-14T20:32:16Z

@MattGertz We talked about this and we're not quite sure how to reproduce it specifically. We could probably trigger a deterministic failure by wrapping the stream passed to the ObjectReader and forcing it to trigger a cancellation when it starts getting read requests from a different thread than it started on, but it could still be quite a bit of effort. My hope is the "obnoxious comment" provides sufficient deterrent for regressions in the future.

Between @Pilchie and I, we've seen the specific crash a total of 3 times now over the past several weeks.

dpoeschl · 2017-06-16T05:38:30Z

retest windows_debug_vs-integration_prtest please
Some issues fixed by #20260

dpoeschl · 2017-06-16T05:38:48Z

retest windows_release_vs-integration_prtest please
Some issues fixed by #20260

dnfclas added the cla-already-signed label Jun 12, 2017

sharwell requested a review from Pilchie June 12, 2017 15:38

Pilchie approved these changes Jun 12, 2017

View reviewed changes

Do not allow ReadValue to return while a stream is still being read

240471a

Fixes dotnet#19761

sharwell force-pushed the wait-for-read branch from 3c55944 to 240471a Compare June 12, 2017 15:53

CyrusNajmabadi reviewed Jun 12, 2017

View reviewed changes

CyrusNajmabadi approved these changes Jun 12, 2017

View reviewed changes

Do not allow WriteArray or WriteObject to return while a stream is st…

498171f

…ill being written

MattGertz added Pending director approval Approved to merge and removed Pending director approval labels Jun 14, 2017

sharwell merged commit bcfaa14 into dotnet:master Jun 16, 2017

sharwell deleted the wait-for-read branch June 16, 2017 11:24

sharwell added this to the 15.3 milestone Jun 27, 2017

sharwell mentioned this pull request Jul 11, 2017

Remove capture allocation from WriteArray #20775

Merged

Do not allow ReadValue to return while a stream is still being read #20171

Do not allow ReadValue to return while a stream is still being read #20171

Uh oh!

Conversation

sharwell commented Jun 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ask Mode

Uh oh!

Pilchie Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

sharwell Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

Pilchie commented Jun 12, 2017

Uh oh!

CyrusNajmabadi Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

stephentoub Jun 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi left a comment

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi commented Jun 12, 2017

Uh oh!

CyrusNajmabadi commented Jun 12, 2017

Uh oh!

sharwell commented Jun 13, 2017

Uh oh!

MattGertz commented Jun 13, 2017

Uh oh!

Pilchie commented Jun 13, 2017

Uh oh!

sharwell commented Jun 14, 2017

Uh oh!

dpoeschl commented Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpoeschl commented Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sharwell commented Jun 12, 2017 •

edited

Loading

stephentoub Jun 12, 2017 •

edited

Loading

dpoeschl commented Jun 16, 2017 •

edited

Loading

dpoeschl commented Jun 16, 2017 •

edited

Loading