Skip to content

8356228: NMT does not record reserved memory base address correctly #25719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

afshin-zafari
Copy link
Contributor

@afshin-zafari afshin-zafari commented Jun 10, 2025

The problem here stems from the following facts:

  1. When requesting an aligned virtual memory, the size may be extended to meet the alignment requirements. The extra size from the two ends are released afterward.
  2. NMT ignores tracking of releasing CDS regions that are contained in larger regions, because they should be in a specific order.
  3. In linux-aarch64 environment, the alignment size is 64K, reserve operations may fall into cases as 1) above.

So, if ArchiveBuffer memory is reserved with CDS tag, there might be some released regions that NMT ignored and then later complains their intersection with other regions (e.g., thread stack regions observed in the issue).

Solution is to reserve the memory with no tag (mtNone) and if it succeeds change the tag of the region to CDS for further trackings.

Tests:
linux-x64: runtime/NMT jtreg tests passed
linux-aarch64: tier1 to tier3 passed


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8356228: NMT does not record reserved memory base address correctly (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25719/head:pull/25719
$ git checkout pull/25719

Update a local copy of the PR:
$ git checkout pull/25719
$ git pull https://git.openjdk.org/jdk.git pull/25719/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25719

View PR using the GUI difftool:
$ git pr show -t 25719

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25719.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 10, 2025

👋 Welcome back azafari! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 10, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 10, 2025
@openjdk
Copy link

openjdk bot commented Jun 10, 2025

@afshin-zafari The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 10, 2025
@afshin-zafari
Copy link
Contributor Author

/label hotspot-runtime
/label hotspot-cds

@openjdk
Copy link

openjdk bot commented Jun 10, 2025

@afshin-zafari The hotspot-runtime label was already applied.

@openjdk
Copy link

openjdk bot commented Jun 10, 2025

@afshin-zafari
The label hotspot-cds is not a valid label.
These labels are valid:

  • graal
  • serviceability
  • hotspot
  • hotspot-compiler
  • ide-support
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • security
  • hotspot-runtime
  • jmx
  • build
  • nio
  • client
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr

@mlbridge
Copy link

mlbridge bot commented Jun 10, 2025

Webrevs

Comment on lines +343 to +346
// Once the memory is reserved, NMT can track it as mtClassShared.
// During an aligned reserve, some extra sub-regions of the memory region may be released due to alignment requirements.
// NMT will ignore releasing sub-regions (i.e., contained in other regions) that have mtClassShared tags.
MemTracker::record_virtual_memory_tag(rs, mtClassShared);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to capture this logic in a test, so we can always check for this behavior?

Where in the code do we release and check for the special case of mtClassShared?

I'm not 100% sure I understand why we have to first reserve with mtNone and only then change it to mtClassShared. Why can't we reserve with mtClassShared right from the start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to capture this logic in a test, so we can always check for this behavior?

Work on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the code do we release and check for the special case of mtClassShared?

In virtualMemoryTracker.cpp, remove_released_region(adddress, size) in an if block as this:

if (reserved_rgn->mem_tag() == mtClassShared) {
    if (reserved_rgn->contain_region(addr, size)) {
      // This is an unmapped CDS region, which is part of the reserved shared
      // memory region.
      // See special handling in VirtualMemoryTracker::add_reserved_region also.
      return true;
    }

    if (size > reserved_rgn->size()) {
      // This is from release the whole region spanning from archive space to class space,
      // so we release them altogether.
      ReservedMemoryRegion class_rgn(addr + reserved_rgn->size(),
                                     (size - reserved_rgn->size()));
      ReservedMemoryRegion* cls_rgn = _reserved_regions->find(class_rgn);
      assert(cls_rgn != nullptr, "Class space region  not recorded?");
      assert(cls_rgn->mem_tag() == mtClass, "Must be class mem tag");
      remove_released_region(reserved_rgn);
      remove_released_region(cls_rgn);
      return true;
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure I understand why we have to first reserve with mtNone and only then change it to mtClassShared. Why can't we reserve with mtClassShared right from the start?

The sequence is as follows:
All the reserve/release operations here are from os:: namespace and therefore will call NMT tracking API
1- CDS requests for reserving S bytes at address Base with alignment A and memTag M.
2- Due to OS alignment limitations regarding A, new base XBase to be used with new size XS. Read X as extra here.
3- Do instead, reserve XS bytes at XBase.
4- If succeeds, memory would be like <--s1---><----S----><----s2--->, where XS = s1 + S + s2 and XBase + s1 == Base
5- Release extra parts s1 and s2.
6- request at step 1 above is fulfilled now.

If at step 1, the memTag M is mtClassShared , the releases at step 5 will be ignored by NMT. Which means that there are regions in NMT lists that are actually released (at os level) but NMT takes them still as reserved. Therefore, further reservations by os at those regions (e.g., for stack) will intersect with existing NMT regions and the NMT assertions fail. So the memTag is to be mtNone to bypass/avoid the NMT ignore branch.

If we want to track the new memory region as belongs to CDS, we need to set the memTag of the region to mtClassShared with the corresponding MemTracker API.(as is done here in this PR)

@stefank
Copy link
Member

stefank commented Jun 11, 2025

NMT ignores tracking of releasing CDS regions that are contained in larger regions, because they should be in a specific order.

The special-case handling of CDS in NMT breaks abstractions and has caused a number of bugs. I'm worried that trying to workaround this special-case code makes the code even more susceptible for future bugs. So, therefore I'm interested to here about the planes you have to replace the NMT implementation with a tree. Isn't that going to remove the CDS special-case code and maybe be even solve this issue without the need for this fix? If not, feel free to ignore my comment.

@afshin-zafari
Copy link
Contributor Author

NMT ignores tracking of releasing CDS regions that are contained in larger regions, because they should be in a specific order.

The special-case handling of CDS in NMT breaks abstractions and has caused a number of bugs. I'm worried that trying to workaround this special-case code makes the code even more susceptible for future bugs. So, therefore I'm interested to here about the planes you have to replace the NMT implementation with a tree. Isn't that going to remove the CDS special-case code and maybe be even solve this issue without the need for this fix? If not, feel free to ignore my comment.

In the new implementation of NMT with tree, there will be no special-case handling neither for CDS nor for any other memory types. Thanks for bringing this up.

@gerard-ziemski
Copy link

Do we need to bother with this fix then? Should we wait for the VMATree and just verify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants