8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity#30723
8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity#30723pengxiaolong wants to merge 9 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into |
|
@pengxiaolong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 34 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@pengxiaolong The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
|
Do we see measurable impact on our CI performance tests? |
CI performance test for this solution is still running, I'll share the result once is done(Not really expect big difference) |
earthling-amzn
left a comment
There was a problem hiding this comment.
LGTM, let's wait for pipeline testing to complete.
CI performance test is done for aarch64, result looks good. I'll integrate the PR tomorrow. |
The improvement I made in ticket https://bugs.openjdk.org/browse/JDK-8345423 makes trashed region be recycled by GC workers w/o holding heap lock, this may cause assert from mutator memory allocation path in which mutator calls
ShenandoahFreeSet::assert_bounds()perform verifications which traverse all regions, given GC workers recycle trashed region w/o heap lock, it is possible that one mutator can see inconsistent heap stats while a GC worker thread is recycling the same trashed region. It is a memory order issue, inserting fence before the final update of region state should fix it.Also revisited the code of
ShenandoahHeapRegion::try_recycle_under_lock(), this bug could also happen for the region a thread is trying to allocate in as well, but chance is much lower since it only check the state of one region, in this case allocation can be dong in a region which is not ready for use yet and resulting corrupted heap leading to crash. I have been trying to reproduce crash from release build, but wasn't lucky yet(We did see customer reporting crash from marking phase after days of running on thousands of hosts, and we were not able to reproduce the crash).The PR I withdrew yesterday is a much heavier solution which introduce lock for each region, it works well in my test(stress test is still running after 1600 iterations), comparing to that one, this is a lightweight solution.
Additional tests:
applications/jcstress/volatiles.java(Linux aach64)Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30723/head:pull/30723$ git checkout pull/30723Update a local copy of the PR:
$ git checkout pull/30723$ git pull https://git.openjdk.org/jdk.git pull/30723/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30723View PR using the GUI difftool:
$ git pr show -t 30723Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30723.diff
Using Webrev
Link to Webrev Comment