Skip to content

8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity#30723

Open
pengxiaolong wants to merge 9 commits intoopenjdk:masterfrom
pengxiaolong:JDK-8381382-lightweight-fix
Open

8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity#30723
pengxiaolong wants to merge 9 commits intoopenjdk:masterfrom
pengxiaolong:JDK-8381382-lightweight-fix

Conversation

@pengxiaolong
Copy link
Copy Markdown

@pengxiaolong pengxiaolong commented Apr 14, 2026

The improvement I made in ticket https://bugs.openjdk.org/browse/JDK-8345423 makes trashed region be recycled by GC workers w/o holding heap lock, this may cause assert from mutator memory allocation path in which mutator calls ShenandoahFreeSet::assert_bounds() perform verifications which traverse all regions, given GC workers recycle trashed region w/o heap lock, it is possible that one mutator can see inconsistent heap stats while a GC worker thread is recycling the same trashed region. It is a memory order issue, inserting fence before the final update of region state should fix it.

Also revisited the code of ShenandoahHeapRegion::try_recycle_under_lock(), this bug could also happen for the region a thread is trying to allocate in as well, but chance is much lower since it only check the state of one region, in this case allocation can be dong in a region which is not ready for use yet and resulting corrupted heap leading to crash. I have been trying to reproduce crash from release build, but wasn't lucky yet(We did see customer reporting crash from marking phase after days of running on thousands of hosts, and we were not able to reproduce the crash).

The PR I withdrew yesterday is a much heavier solution which introduce lock for each region, it works well in my test(stress test is still running after 1600 iterations), comparing to that one, this is a lightweight solution.

Additional tests:

  • all tests in test suite hotspot_gc_shenandoah (MacOS M3)
  • 2000 iterations of stress test applications/jcstress/volatiles.java (Linux aach64)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30723/head:pull/30723
$ git checkout pull/30723

Update a local copy of the PR:
$ git checkout pull/30723
$ git pull https://git.openjdk.org/jdk.git pull/30723/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30723

View PR using the GUI difftool:
$ git pr show -t 30723

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30723.diff

Using Webrev

Link to Webrev Comment

@pengxiaolong pengxiaolong changed the title Jdk 8381382 lightweight fix 8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity Apr 14, 2026
@bridgekeeper
Copy link
Copy Markdown

bridgekeeper bot commented Apr 14, 2026

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk bot commented Apr 14, 2026

@pengxiaolong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8381382: Shenandoah: assert(capacity > 0) failed: free regions must have allocation capacity

Reviewed-by: kdnilsen, wkemper

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 34 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Apr 14, 2026
@openjdk
Copy link
Copy Markdown

openjdk bot commented Apr 14, 2026

@pengxiaolong The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@pengxiaolong pengxiaolong marked this pull request as ready for review April 14, 2026 15:24
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 14, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge bot commented Apr 14, 2026

Webrevs

@kdnilsen
Copy link
Copy Markdown
Contributor

Do we see measurable impact on our CI performance tests?

@pengxiaolong
Copy link
Copy Markdown
Author

Do we see measurable impact on our CI performance tests?

CI performance test for this solution is still running, I'll share the result once is done(Not really expect big difference)

Copy link
Copy Markdown
Contributor

@earthling-amzn earthling-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for pipeline testing to complete.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 14, 2026
@pengxiaolong
Copy link
Copy Markdown
Author

Do we see measurable impact on our CI performance tests?

CI performance test for this solution is still running, I'll share the result once is done(Not really expect big difference)

CI performance test is done for aarch64, result looks good.
There is one test being improved quite a lot, but I don't really know whether it is related to this change or not.

Genshen/diluvian_large diluvian | 13327.500 ±324.5 | 10194.500 ±46.0 | -23.5% | 0.031
Genshen/diluvian_medium diluvian | 5922.000 ±52.0 | 5085.000 ±18.5 | -14.1% | 0.031

I'll integrate the PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants