Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openjdk8_j9_extended.system_x86-64_mac DaaLoadTest_all_ConcurrentScavenge_0_FAILED Segmentation error vmState=0x00020003 #8020

Closed
JasonFengJ9 opened this issue Dec 8, 2019 · 5 comments · Fixed by eclipse-omr/omr#4716 or #8291
Labels
comp:gc segfault Issues that describe segfaults / JVM crashes test failure

Comments

@JasonFengJ9
Copy link
Member

Failure link

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/213/tapResults/

Optional info

Failure output (captured from console output)

DLT 23:07:53.545 - Starting thread. Suite=0 thread=0
DLT 23:07:53.547 - Starting thread. Suite=0 thread=1
DLT 23:08:13.604 - Completed 6.4%. Number of tests started=459
DLT stderr Unhandled exception
DLT stderr Type=Segmentation error vmState=0x00020003
DLT stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
DLT stderr Handler1=000000000ECA1E40 Handler2=000000000EE97B60 InaccessibleAddress=0000000000630018
DLT stderr RDI=00007FDA19E0BAF0 RSI=0000000000630000 RAX=0000000000010000 RBX=00007FDA1B031820
DLT stderr RCX=00000000E0A90000 RDX=000000001DD0D1F0 R8=0000000000005870 R9=00007FDA1B0EC268
DLT stderr R10=00000000FFFFFF00 R11=0F0F0F0F0F0F0F0F R12=00000000E0A8A790 R13=0000000000000000
DLT stderr R14=00000000E0A8A790 R15=00000000E0A8A788
DLT stderr RIP=00000000106E94BC GS=0000 FS=0000 RSP=00007000007DEB50
DLT stderr RFlags=0000000000010206 CS=002B RBP=00007000007DEBA0 ERR=0063001800000004
DLT stderr TRAPNO=000000040000000E CPU=0018000000040000 FAULTVADDR=0000000000630018
DLT stderr XMM0 404d205b7259579a (f: 1918457728.000000, d: 5.825279e+01)
DLT stderr XMM1 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
DLT stderr XMM2 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
DLT stderr XMM3 3f399d9a00000000 (f: 0.000000, d: 3.908635e-04)
DLT stderr XMM4 be747feac1cd04f7 (f: 3251438848.000000, d: -7.636724e-08)
DLT stderr XMM5 3fcc8ff600000000 (f: 0.000000, d: 2.231433e-01)
DLT stderr XMM6 000fffffffffffff (f: 4294967296.000000, d: 2.225074e-308)
DLT stderr XMM7 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
DLT stderr XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr Module=/Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdkbinary/j2sdk-image/jre/lib/compressedrefs/libj9gc29.dylib
DLT stderr Module_base_address=00000000105B7000 Symbol=_ZN41MM_SweepPoolManagerAddressOrderedListBase13addFreeMemoryEP18MM_EnvironmentBaseP21MM_ParallelSweepChunkPmm
DLT stderr Symbol_address=00000000106E9460
DLT stderr Target=2_90_20191207_219 (Mac OS X 10.11.6)
DLT stderr CPU=amd64 (4 logical CPUs) (0x200000000 RAM)
DLT stderr ----------- Stack Backtrace -----------
DLT stderr ---------------------------------------
DLT stderr JVMDUMP039I Processing dump event "gpf", detail "" at 2019/12/07 23:08:19 - please wait.
DLT stderr JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/core.20191207.230819.77090.0001.dmp' in response to an event
DLT stderr JVMDUMP010I System dump written to /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/core.20191207.230819.77090.0001.dmp
DLT stderr JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/javacore.20191207.230819.77090.0002.txt' in response to an event
DLT stderr JVMDUMP010I Java dump written to /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/javacore.20191207.230819.77090.0002.txt
DLT stderr JVMDUMP032I JVM requested Snap dump using '/Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/Snap.20191207.230819.77090.0003.trc' in response to an event
DLT stderr JVMDUMP010I Snap dump written to /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/Snap.20191207.230819.77090.0003.trc
DLT stderr JVMDUMP007I JVM Requesting JIT dump using '/Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/jitdump.20191207.230819.77090.0004.dmp'
DLT stderr JVMDUMP010I JIT dump written to /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/jitdump.20191207.230819.77090.0004.dmp
DLT stderr JVMDUMP013I Processed dump event "gpf", detail "".
STF 23:08:23.507 - Found dump at: /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/javacore.20191207.230819.77090.0002.txt
STF 23:08:23.508 - Found dump at: /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/core.20191207.230819.77090.0001.dmp
STF 23:08:23.509 - Found dump at: /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/Snap.20191207.230819.77090.0003.trc
STF 23:08:23.509 - **FAILED** Process DLT ended with exit code (255) and not the expected exit code/s (0)
DLT stderr javacore file generated - /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/javacore.20191207.230819.77090.0002.txt
DLT stderr core file generated - /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/core.20191207.230819.77090.0001.dmp
DLT stderr Snap file generated - /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/results/Snap.20191207.230819.77090.0003.trc
STF 23:08:23.509 - Monitoring Report Summary:
STF 23:08:23.509 -   o Process DLT has crashed unexpectedly
STF 23:08:23.509 - Killing processes: DLT
STF 23:08:23.509 -   o Process DLT is not running
**FAILED** at step 1 (Run daa load test). Expected return value=0 Actual=1 at /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/../TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/execute.pl line 95.
STF 23:08:23.706 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 23:08:23.706 - 
STF 23:08:23.706 - ====================   T E A R D O W N   ====================
STF 23:08:23.706 - Running teardown: perl /Users/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_mac_Nightly/openjdk-tests/TKG/../TKG/test_output_15757779336279/DaaLoadTest_all_ConcurrentScavenge_0/20191207-230750-DaaLoadTest/tearDown.pl
STF 23:08:23.760 - TEARDOWN stage completed
STF 23:08:23.766 - 
STF 23:08:23.766 - =====================   R E S U L T S   =====================
STF 23:08:23.766 - Stage results:
STF 23:08:23.766 -   setUp:     pass
STF 23:08:23.766 -   execute:  *fail*
STF 23:08:23.766 -   teardown:  pass
STF 23:08:23.766 - 
STF 23:08:23.766 - Overall result: **FAILED**

DaaLoadTest_all_ConcurrentScavenge_0_FAILED

This appears different from #8019.

@pshipton
Copy link
Member

pshipton commented Dec 9, 2019

@dmitripivkine can you pls take a look.

@dmitripivkine
Copy link
Contributor

@amicic FYI

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Dec 10, 2019

This crash occur because Mark Map for heap range above 0xe0a80000 in Tenure has not been initialized at Concurrent Kickoff time. As a result garbage in Mark Map triggered failure (an attempt to read object at 0xE0A8A788)

@dmitripivkine
Copy link
Contributor

@RSalman Would you please take a look?

@pshipton pshipton added this to the Release 0.19 (Java 14) milestone Dec 17, 2019
RSalman added a commit to RSalman/omr that referenced this issue Jan 13, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
@RSalman
Copy link
Contributor

RSalman commented Jan 13, 2020

The issue always seems to be when Concurrent kickoff occurs in the middle of heap expansion (which is fine but we have a very specific timing hole)

At a minimum, when expanding, the mark map must be either set/cleared or added to init ranges table (which will end up eventually initing) . However, there is a very specific timing window where neither is done, which I believe is causing the crash. The logic to init the mark map is dependent on Concurrent State, as a result concurrent kickoff in a middle of expansion can have some unintended side effects.

The timing hole is a result of the following:

If Expansion starts before Concurrent KO and we commit the mark map while concurrent is off, we won’t clear it, we would except the init ranges table to be updated given Concurrent is off. However, this is not the case if Concurrent starts and init is completed by the time we go to check Concurrent state to rebuild init ranges table. This is possible, heapAddRange does the first check to clear the mark map and heapReconfigured does the second check to rebuild the table.. hence we have an issue in the following:

  • Expand Start (Concurrent off)
  • [Expand Routine] heapAddRange called
    • Commit Mark Map
    • Should we clear the mark map? No Concurrent is OFF -> DON’T CLEAR
  • CONCURRENT start Kickoff, kickoff -> get init ranges -> Concurrent init Complete -> Concurrent ON
  • [Expand Routine] heapReconfigured called (Concurrent On)
    Should we update the init ranges table? NO Concurrent is ON -> DON’T REBUILD

The mark map is never initialized. If the expansion had completed before KO init, we would of updated init ranges table and it would of been visible to KO init and eventually gotten inited/cleared…. if the expansion had started right after KO init, we would of simply cleared and delayed updating mark map until next safe point! Since neither is done and this was expanded after a contract, we are dealing with corrupted piece of mark map.

RSalman added a commit to RSalman/omr that referenced this issue Jan 14, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Jan 14, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Jan 14, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
@pshipton pshipton added the segfault Issues that describe segfaults / JVM crashes label Jan 15, 2020
RSalman added a commit to RSalman/omr that referenced this issue Jan 15, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Jan 15, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/openj9 that referenced this issue Jan 16, 2020
`heapReconfigured` API changes resulting from
eclipse-omr/omr#4716

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Jan 16, 2020
- MarkMap init consolidated

MarkMap init logic has been merged to resolve the timing hole outlined
in eclipse-openj9/openj9#8020. MarkMap init is
dependent on Concurrent State of the global collector, with the init
logic split between `heapAddRange` and `heapReconfigured`, any changes
in states between the two may leave expanded mark map uninited. Hence,
the decision to init (`setMarkBitsInRange`) the mark map has been moved
from `heapAddRange` to `heapReconfigured` which is where we determine if
an update to the init table is required
(`tuneToHeap`/`determineInitWork`). As a result, init is no longer
affected by the change in Concurrent state and we eliminate the timing
window. This guarantee that the MarkMap will be inited, either on the
spot or afterwards by updating the init table.

- Introduced HeapReconfigReason and reworked heapReconfigred API

heapReconfigred now distinguishes different reasons for reconfiguration
(Expand, contract, etc), this functionality is required for the Mark Map
init changes, specifically for Gencon and it is used when Scavenger
resizes tenure _(PhysicalSubArenaVirtualMemorySemiSpace)_ or global
collecter preforms a resize _(PhysicalSubArenaVirtualMemoryFlat)_. At
the time, this is new information is only consumed by Concurrent Global
Collecter (ConcurrentGC.cpp), for policies making use of other
collectors the HeapReconfigReason param defaults to `NONE`.

The Reconfig reasons are dealt in the following ways by Concurrent
Global Collector:

- We should never end up in `heapReconfigured` with `RECONFIG_NONE`
reason
- `RECONFIG_CONTRACT` signifies that `heapRemoveRange` had taken place,
in which case, we just need to update init table (`tuneToHeap`) when
Concurrent is Off, otherwise just `adjustTraceTargets`
- If `heapAddRange` takes place then `heapReconfigured` is called with
`RECONFIG_EXPAND` and have have two different cases:
      1)  heapAddRange was successful (return true), heap reconfig
should be provided with `lowAddress` & `highAddress` which will be used
to init mark map
      2) heapAddRange returns false, signifies a failed `heapAddRange`,
address params are expected to be NULL, in which case mark map won't be
inited but we'll still either `tuneToHeap` or `adjustTraceTarget`.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/openj9 that referenced this issue Jan 17, 2020
`heapReconfigured` API changes resulting from
eclipse-omr/omr#4716

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 4, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue [eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 4, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue
[eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 10, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue
[eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 10, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue
[eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 11, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue
[eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
RSalman added a commit to RSalman/omr that referenced this issue Aug 11, 2021
Fixes eclipse-openj9/openj9#13206

This change switches the order between updating INIT table (tuneToHeap
call) and setting mark bits in place. This way, we guarantee that MM
bits are set in place when we miss to update INIT table (or it's not
updated in time) given overlap of KO and Tenure Expand.

The issue being resolved is similar to what has been outlined in a
previous issue
[eclipse-openj9/openj9#8020 (comment)],
where Mark Map Bits get missed for init. Here, we have another similar
timing hole, when one thread kicks off concurrent, Concurrent_OFF ->
(Concurrent_INIT or INIT Complete), while the expanding thread is in
middle of heapReconfigured.

With the original ordering of heapReconfigured, we first attempt to set
the mark bits in place when Concurrent is ON. With Concurrent_OFF, we
delay setting the bits until Concurrent_INIT. This requires update to
the init table. Hence, for Concurrent_OFF we forgo setting bits and
update the init table, we will expect `tuneToHeap` to do
`determineInitRanges` to update init ranges table. The issue occurs when
we don't set mark bits in place (given concurrent_OFF) but concurrent
starts after the check and prior to updating the init ranges. Here,we
either don't update init table (since init is in progress) or miss to
update it in time.

With these changes of reordering heapReonfig, we will try to set mark
bits in place only after check to update init range table. With this,
any state transitions resulting in init table not being updated (or
being updated to late) will be caught and accounted for as bits will be
set after.

Signed-off-by: Salman Rana <salman.rana@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gc segfault Issues that describe segfaults / JVM crashes test failure
Projects
None yet
4 participants