Skip to content

Conversation

@stefank
Copy link
Member

@stefank stefank commented May 3, 2021

A deadlock can happen when a relocating thread holds the lock and tries to log information about the current thread, which can trigger a load barrier and a secondary relocation. The first relocation is holding the _ref_lock and the second relocation hangs when trying to reacquiring it. This is the stack trace:

#1 0x00007ff375c1fc90 in os::PlatformMonitor::wait
#2 0x00007ff3760cbf92 in ZForwarding::wait_page_released
#3 0x00007ff376118065 in ZRelocate::relocate_object
#4 0x00007ff374f7097b in AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<286790ul, ZBarrierSet>,
#5 0x00007ff375176134 in oopDesc::obj_field
#6 0x00007ff37551c5cb in java_lang_Thread::name
#7 0x00007ff375f2e7ee in JavaThread::get_thread_name_string
#8 0x00007ff376126323 in ZStatPhase::log_end
#9 0x00007ff3761272e8 in ZStatCriticalPhase::register_end
#10 0x00007ff3760cc0b0 in ZForwarding::wait_page_released
#11 0x00007ff376118065 in ZRelocate::relocate_object
#12 0x00007ff3760989c5 in ZLoadBarrierOopClosure::do_oop
#13 0x00007ff375408ca8 in HandleArea::oops_do
#14 0x00007ff375f2e0e9 in JavaThread::oops_do_no_frames

This started to happen after:
8261759: ZGC: ZWorker Threads Continue Marking After System.exit() called

The proposed patch moves the scope of the logging to outside the lock scope. The first _ref_lock check isn't really necessary. It was introduced to limit the allocation stall logging, but if a thread entered wait_page_released it really was on its way to stall.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266432: ZGC: GC allocation stalls can trigger deadlocks

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3839/head:pull/3839
$ git checkout pull/3839

Update a local copy of the PR:
$ git checkout pull/3839
$ git pull https://git.openjdk.java.net/jdk pull/3839/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3839

View PR using the GUI difftool:
$ git pr show -t 3839

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3839.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 3, 2021

👋 Welcome back stefank! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 3, 2021
@openjdk
Copy link

openjdk bot commented May 3, 2021

@stefank The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label May 3, 2021
@mlbridge
Copy link

mlbridge bot commented May 3, 2021

Webrevs

Copy link
Contributor

@pliden pliden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented May 3, 2021

@stefank This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8266432: ZGC: GC allocation stalls can trigger deadlocks

Reviewed-by: pliden, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 9 new commits pushed to the master branch:

  • ff65920: 8265491: Math Signum optimization for x86
  • 55cc0af: 8266185: Shenandoah: Fix incorrect comment/assertion messages
  • 880c138: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion.
  • 001c514: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate
  • 194bcec: 8265984: Concurrent GC: Some tests fail "assert(is_frame_safe(f)) failed: Frame must be safe"
  • 1d9ea3a: 8266083: Shenandoah: Consolidate dedup/no dedup oop closures
  • 80941f4: 8234446: Post-CMS workgroup hierarchy cleanup
  • ac760c7: 8266295: Remove unused _concurrent_iteration_safe_limit
  • b42d496: 8266388: C2: Improve constant ShiftCntV on x86

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 3, 2021
@stefank
Copy link
Member Author

stefank commented May 4, 2021

Thanks for the reviews!
/integrate

@openjdk openjdk bot closed this May 4, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 4, 2021
@openjdk
Copy link

openjdk bot commented May 4, 2021

@stefank Since your change was applied there have been 14 commits pushed to the master branch:

  • 30ccd80: 8264950: Set opaque for JTooltip in config file of NimbusLookAndFeel
  • cfdf4a7: 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests
  • 3544a9d: 8266391: Replace use of reflection in jdk.internal.platform.Metrics
  • 020236c: 8264786: [macos] All Swing/AWT apps cause Allow Notifications prompt to appear when app is launched
  • 45760d4: 8266320: (bf) ReadOnlyBufferException in heap buffer put(String,int,int) should not be conditional
  • ff65920: 8265491: Math Signum optimization for x86
  • 55cc0af: 8266185: Shenandoah: Fix incorrect comment/assertion messages
  • 880c138: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion.
  • 001c514: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate
  • 194bcec: 8265984: Concurrent GC: Some tests fail "assert(is_frame_safe(f)) failed: Frame must be safe"
  • ... and 4 more: https://git.openjdk.java.net/jdk/compare/05cfac9f5bf07c3c4422f797a61b6e1b8410ce1b...master

Your commit was automatically rebased without conflicts.

Pushed as commit ce1bc9d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@stefank stefank deleted the 8266432 branch January 12, 2022 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants