GCStress: Fix a race-condition #5356

swaroop-sridhar · 2016-06-01T04:36:11Z

This change ensures that calls to CORINFO_HELP_STOP_FOR_GC() themselves
are not converted to GC-Stress traps -- thus preventing the race between
the GC thread, and a managed thread under GCStress.

Identification of calls to CORINFO_HELP_STOP_FOR_GC():
Since this is a GCStress only requirement, its not worth special identification in the GcInfo
Since CORINFO_HELP_STOP_FOR_GC() calls are realized as indirect calls by the JIT, we cannot identify
them by address at the time of SprinkleBreakpoints().
So, we actually let the SprinkleBreakpoints() replace the call to CORINFO_HELP_STOP_FOR_GC()
with a trap, and revert it back to the original instruction the first time we hit the trap in
OnGcCoverageInterrupt().

Upside: No change to GCInfo or JIT is necessary
Downside: Need to decode a few bytes on every GCStress interrupt.

Local testing with COMPlus_GCStress=0xC and COMPlus_GCConcurrent=1 did not encounter the race condition in #4794

Fixes #4794

swaroop-sridhar · 2016-06-01T04:37:57Z

@sergiy-k / @jkotas please review.
+CC: @RussKeldorph

jkotas · 2016-06-01T09:19:56Z

src/vm/gccover.cpp

+    SLOT nextInstr;
+    SLOT target = getTargetOfCall(savedInstrPtr, regs, &nextInstr);
+
+    if (target == (SLOT)JIT_RareDisableHelper) {


Instead of doing all this call target decoding, would it be better to just check whether we are in preemptive mode?

The call target decoding will need ongoing tweaks, e.g. I hope to add support for interop transitions for ReadyToRun (based on CORJIT_FLG2_USE_PINVOKE_HELPERS) - that will use a different helper.

I just spoke with @jkotas.
The check for PremptiveMode will not work, because when returning from a PInvoke epilog, the thread is already in cooperative mode.

jkotas · 2016-06-01T09:21:07Z

Can we also delete m_fPreemptiveGCDisabledForGCStress now? It solving the same problem - it should be unnecessary now with this fix.

swaroop-sridhar · 2016-06-01T19:34:39Z

Yes, we can remove the m_fPreemptiveGCDisabledForGCStress functionality, separate from this fix.

In the case of PInvoke,

mov      byte  ptr[rsi + 12], 0   // Switch to preemptive mode [thread->premptiveGcDisabled = 0]
cli          // INTERRUPT_INSTR_CALL in place of the actual native call 
mov      byte  ptr[rsi + 12], 1   // Switch the thread to Cooperative mode
cmp      dword ptr[(reloc 0x7ffd1bb77148)], 0  // if(g_TrapReturningThreads)
je       SHORT G_M40565_IG05
cli       // INTERRUPT_INSTR_CALL in place of COREINFO_STOP_FOR_GC()

We currently do two GCStress collections:

At the native call
At the call to COREINFO_STOP_FOR_GC()

(1) is not wrong, but is unnecessary coverage, because that code will never trigger a GC during normal execution.
(2) is definitely wrong and will be fixed by this change,

So, I agree that its a good idea to remove this m_fPreemptiveGCDisabledForGCStress and simplify the gccoverage implemetation. I think it is a good idea to do it as a separate checkin.

swaroop-sridhar · 2016-06-01T21:50:20Z

@dotnet-bot test Windows_NT x64 Checked gcstress0xc

swaroop-sridhar · 2016-06-01T21:50:38Z

@dotnet-bot test Windows_NT Checked arm64

This change ensures that calls to CORINFO_HELP_STOP_FOR_GC() themselves are not converted to GC-Stress traps -- thus preventing the race between the GC thread, and a managed thread under GCStress. Identification of calls to CORINFO_HELP_STOP_FOR_GC(): Since this is a GCStress only requirement, its not worth special identification in the GcInfo Since CORINFO_HELP_STOP_FOR_GC() calls are realized as indirect calls by the JIT, we cannot identify them by address at the time of SprinkleBreakpoints(). So, we actually let the SprinkleBreakpoints() replace the call to CORINFO_HELP_STOP_FOR_GC() with a trap, and revert it back to the original instruction the first time we hit the trap in OnGcCoverageInterrupt(). Upside: No change to GCInfo or JIT is necessary Downside: Need to decode a few bytes on every GCStress interrupt. Fixes #4794

swaroop-sridhar · 2016-06-01T22:26:25Z

The latest commit is to fix an X86 build break, by fixing an extern "C" declaration here: https://github.com/dotnet/coreclr/pull/5356/files#diff-4cbc7cebee869048dc921baad12c9240R420

swaroop-sridhar · 2016-06-01T22:26:33Z

@dotnet-bot test Windows_NT x64 Checked gcstress0xc

swaroop-sridhar · 2016-06-01T22:26:42Z

@dotnet-bot test Windows_NT Checked arm64

swaroop-sridhar · 2016-06-01T22:26:49Z

@dotnet-bot test Windows_NT Checked arm

swaroop-sridhar · 2016-06-01T22:32:47Z

@dotnet-bot test Windows_NT Checked x64 gcstress0xc

swaroop-sridhar · 2016-06-01T23:08:01Z

@dotnet-bot test Windows_NT gcstress0xc

swaroop-sridhar · 2016-06-01T23:10:53Z

@dotnet-bot test Windows_NT arm64 Checked

swaroop-sridhar · 2016-06-01T23:39:12Z

@dotnet-bot test Windows_NT arm Checked

swaroop-sridhar · 2016-06-02T01:09:08Z

@jkotas: Checking if you had any more corrections.
Most of the testing has passed. The GCStress leg is still running, and hasn't hit the race.

swaroop-sridhar · 2016-06-02T06:11:50Z

@dotnet-bot test Windows_NT gcstress0xc

swaroop-sridhar · 2016-06-02T16:05:41Z

The failures in GCStress are pres-existing / timeouts.

jkotas · 2016-06-02T18:41:24Z

LGTM

swaroop-sridhar · 2016-06-02T18:54:59Z

Looks like this change also fixes https://github.com/dotnet/coreclr/issues/5310 and https://github.com/dotnet/coreclr/issues/2785

GCStress: Fix a race-condition Commit migrated from dotnet/coreclr@2fc2547

dnfclas added the cla-already-signed label Jun 1, 2016

jkotas reviewed Jun 1, 2016
View reviewed changes

swaroop-sridhar force-pushed the Trap branch from 1ce5f9f to bd9712d Compare June 1, 2016 22:24

swaroop-sridhar merged commit 2fc2547 into dotnet:master Jun 2, 2016

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/coreclr#5356 from swaroop-sridhar/Trap

24bda7b

GCStress: Fix a race-condition Commit migrated from dotnet/coreclr@2fc2547

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCStress: Fix a race-condition #5356

GCStress: Fix a race-condition #5356

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

jkotas Jun 1, 2016

swaroop-sridhar Jun 1, 2016

jkotas commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016

jkotas commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016

GCStress: Fix a race-condition #5356

GCStress: Fix a race-condition #5356

Conversation

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

jkotas Jun 1, 2016

Choose a reason for hiding this comment

swaroop-sridhar Jun 1, 2016

Choose a reason for hiding this comment

jkotas commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 1, 2016

swaroop-sridhar commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016

jkotas commented Jun 2, 2016

swaroop-sridhar commented Jun 2, 2016