Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

GCStress: Fix a race-condition #5356

Merged
merged 1 commit into from
Jun 2, 2016
Merged

Conversation

swaroop-sridhar
Copy link

This change ensures that calls to CORINFO_HELP_STOP_FOR_GC() themselves
are not converted to GC-Stress traps -- thus preventing the race between
the GC thread, and a managed thread under GCStress.

Identification of calls to CORINFO_HELP_STOP_FOR_GC():
Since this is a GCStress only requirement, its not worth special identification in the GcInfo
Since CORINFO_HELP_STOP_FOR_GC() calls are realized as indirect calls by the JIT, we cannot identify
them by address at the time of SprinkleBreakpoints().
So, we actually let the SprinkleBreakpoints() replace the call to CORINFO_HELP_STOP_FOR_GC()
with a trap, and revert it back to the original instruction the first time we hit the trap in
OnGcCoverageInterrupt().

Upside: No change to GCInfo or JIT is necessary
Downside: Need to decode a few bytes on every GCStress interrupt.

Local testing with COMPlus_GCStress=0xC and COMPlus_GCConcurrent=1 did not encounter the race condition in #4794

Fixes #4794

@swaroop-sridhar
Copy link
Author

@sergiy-k / @jkotas please review.
+CC: @RussKeldorph

SLOT nextInstr;
SLOT target = getTargetOfCall(savedInstrPtr, regs, &nextInstr);

if (target == (SLOT)JIT_RareDisableHelper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing all this call target decoding, would it be better to just check whether we are in preemptive mode?

The call target decoding will need ongoing tweaks, e.g. I hope to add support for interop transitions for ReadyToRun (based on CORJIT_FLG2_USE_PINVOKE_HELPERS) - that will use a different helper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spoke with @jkotas.
The check for PremptiveMode will not work, because when returning from a PInvoke epilog, the thread is already in cooperative mode.

@jkotas
Copy link
Member

jkotas commented Jun 1, 2016

Can we also delete m_fPreemptiveGCDisabledForGCStress now? It solving the same problem - it should be unnecessary now with this fix.

@swaroop-sridhar
Copy link
Author

Yes, we can remove the m_fPreemptiveGCDisabledForGCStress functionality, separate from this fix.

In the case of PInvoke,

mov      byte  ptr[rsi + 12], 0   // Switch to preemptive mode [thread->premptiveGcDisabled = 0]
cli          // INTERRUPT_INSTR_CALL in place of the actual native call 
mov      byte  ptr[rsi + 12], 1   // Switch the thread to Cooperative mode
cmp      dword ptr[(reloc 0x7ffd1bb77148)], 0  // if(g_TrapReturningThreads)
je       SHORT G_M40565_IG05
cli       // INTERRUPT_INSTR_CALL in place of COREINFO_STOP_FOR_GC()

We currently do two GCStress collections:

  1. At the native call
  2. At the call to COREINFO_STOP_FOR_GC()

(1) is not wrong, but is unnecessary coverage, because that code will never trigger a GC during normal execution.
(2) is definitely wrong and will be fixed by this change,

So, I agree that its a good idea to remove this m_fPreemptiveGCDisabledForGCStress and simplify the gccoverage implemetation. I think it is a good idea to do it as a separate checkin.

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked gcstress0xc

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT Checked arm64

This change ensures that calls to CORINFO_HELP_STOP_FOR_GC() themselves
are not converted to GC-Stress traps -- thus preventing the race between
the GC thread, and a managed thread under GCStress.

Identification of calls to CORINFO_HELP_STOP_FOR_GC():
Since this is a GCStress only requirement, its not worth special identification in the GcInfo
Since CORINFO_HELP_STOP_FOR_GC() calls are realized as indirect calls by the JIT, we cannot identify
them by address at the time of SprinkleBreakpoints().
So, we actually let the SprinkleBreakpoints() replace the call to CORINFO_HELP_STOP_FOR_GC()
with a trap, and revert it back to the original instruction the first time we hit the trap in
OnGcCoverageInterrupt().

Upside: No change to GCInfo or JIT is necessary
Downside: Need to decode a few bytes on every GCStress interrupt.

Fixes #4794
@swaroop-sridhar
Copy link
Author

The latest commit is to fix an X86 build break, by fixing an extern "C" declaration here: https://github.com/dotnet/coreclr/pull/5356/files#diff-4cbc7cebee869048dc921baad12c9240R420

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked gcstress0xc

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT Checked arm64

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT Checked arm

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT Checked x64 gcstress0xc

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT gcstress0xc

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT arm64 Checked

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT arm Checked

@swaroop-sridhar
Copy link
Author

@jkotas: Checking if you had any more corrections.
Most of the testing has passed. The GCStress leg is still running, and hasn't hit the race.

@swaroop-sridhar
Copy link
Author

@dotnet-bot test Windows_NT gcstress0xc

@swaroop-sridhar
Copy link
Author

The failures in GCStress are pres-existing / timeouts.

@jkotas
Copy link
Member

jkotas commented Jun 2, 2016

LGTM

@swaroop-sridhar
Copy link
Author

@swaroop-sridhar swaroop-sridhar merged commit 2fc2547 into dotnet:master Jun 2, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants