Skip to content

8341491: Reserve and commit memory operations should be protected by NMT lock #24084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

roberttoyonaga
Copy link
Contributor

@roberttoyonaga roberttoyonaga commented Mar 17, 2025

Update:

After some discussion it was decided it's not necessary to expand the lock scope for reserve/commit. Instead, we are opting to add comments explaining the reasons for locking and the conditions to avoid which could lead to races. Some of the new tests can be kept because they are general enough to be useful outside of this context.

Summary:

This PR makes memory operations atomic with NMT accounting.

The problem:

In memory related functions like os::commit_memory and os::reserve_memory the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in MemTracker, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races.

1.1 Thread_1 releases range_A.
1.2 Thread_1 tells NMT "range_A has been released".

2.1 Thread_2 reserves (the now free) range_A.
2.2 Thread_2 tells NMT "range_A is reserved".

Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS.

Solution:

Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of NmtVirtualMemoryLocker to protect both the NMT accounting and the memory operation itself.

Other notes:

I also simplified this pattern found in many places:

if (MemTracker::enabled()) {
  MemTracker::NmtVirtualMemoryLocker nvml;
  result = pd_some_operation(addr, bytes);
  if (result != nullptr) {
    MemTracker::record_some_operation(addr, bytes);
  }
} else {
  result = pd_unmap_memory(addr, bytes);
}

To:

MemTracker::NmtVirtualMemoryLocker nvml;
result = pd_unmap_memory(addr, bytes);
MemTracker::record_some_operation(addr, bytes);

This is possible because NmtVirtualMemoryLocker now checks MemTracker::enabled(). MemTracker::record_some_operation already checks MemTracker::enabled() and checks against nullptr. This refactoring previously wasn't possible because ThreadCritical was used before #22745 introduced NmtVirtualMemoryLocker.

I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation.

In many places I've done minor refactoring by relocating calls to MemTracker in order to tighten the locking scope.

In some OS specific code (such as os::map_memory_to_file), I've replaced calls to os::release_memory with os::pd_release_memory. This is to avoid NmtVirtualMemoryLocker reentrancy.

In a few places (such as VirtualMemoryTracker::add_reserved_region) I have replaced tty with defaultStream::output_stream(). Otherwise NmtVirtualMemory_lock would be acquired out of rank order with tty_lock.

Testing:

One concern, due to the expanded critical section, is reentrancy. NmtVirtualMemoryLocker is a HotSpot mutex and is not reentrant. I've added new tests in test_os.cpp and test_virtualMemoryTracker.cpp that try to exercise any usages of NMT that weren't already exercised by existing tests.

tier1 passes on linux and windows. I do not have an AIX machine to test on. Can someone please help run the tests on AIX?


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8341491: Reserve and commit memory operations should be protected by NMT lock (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084
$ git checkout pull/24084

Update a local copy of the PR:
$ git checkout pull/24084
$ git pull https://git.openjdk.org/jdk.git pull/24084/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24084

View PR using the GUI difftool:
$ git pr show -t 24084

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24084.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 17, 2025

👋 Welcome back roberttoyonaga! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 17, 2025

@roberttoyonaga This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341491: Reserve and commit memory operations should be protected by NMT lock

Reviewed-by: stuefe, stefank

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 831 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@stefank, @tstuefe) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot changed the title 8341491 8341491: Reserve and commit memory operations should be protected by NMT lock Mar 17, 2025
@openjdk
Copy link

openjdk bot commented Mar 17, 2025

@roberttoyonaga The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Mar 17, 2025
@roberttoyonaga
Copy link
Contributor Author

/label serviceability

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Mar 21, 2025
@openjdk
Copy link

openjdk bot commented Mar 21, 2025

@roberttoyonaga
The serviceability label was successfully added.

@roberttoyonaga roberttoyonaga marked this pull request as ready for review March 24, 2025 15:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 24, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 24, 2025

Webrevs

@tstuefe
Copy link
Member

tstuefe commented Mar 24, 2025

Ping @JoKern65 for AIX

Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need to extend the lock to be held over the reserve/commit/alloc part.

If we only extend the lock on the release side, then it looks like we get the required exclusion:

lock
1.1 Thread_1 releases range_A.
1.2 Thread_1 tells NMT "range_A has been released".
unlock

2.1 Thread_2 reserves (the now free) range_A.
lock
2.2 Thread_2 tells NMT "range_A is reserved".
unlock

We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2).

And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.

@tstuefe
Copy link
Member

tstuefe commented Mar 24, 2025

@stefank

And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.

What about the case where one thread reserves a range and another thread releases it?

1 Thread A reserves range
2 Thread B releases range
3 Thread B tells NMT "range released"
4 Thread A tells NMT "range reserved"

This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere?

@roberttoyonaga
Copy link
Contributor Author

Hi @stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved?

Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.

I haven't seen that race in the wild, I just noticed that the memory operations weren't protected and thought that it could be a problem.

@stefank
Copy link
Member

stefank commented Mar 25, 2025

And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.

What about the case where one thread reserves a range and another thread releases it?

1 Thread A reserves range 2 Thread B releases range 3 Thread B tells NMT "range released" 4 Thread A tells NMT "range reserved"

This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere?

In a scenario like that, doesn't Thread A have to communicate somehow that Thread B can now start using and/or releasing that reservation? It sounds like you would have a race condition if you don't have synchronization to hand over the reservation from one thread to another. I would expect that such communication would be be placed after the NMT booking in Thread A.

Thread A:
reserve
lock
NMT booking
unlock
<synchronizing operation (maybe a release_store into a data structure)>

Thread B:
<synchronizing operation (maybe a load_acquire from the same data structure)>
lock
release
NMT booking
unlock

Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above?

I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that? If we do, should we really have code like that? Are there some other patterns that I'm missing?

Related to this, I talked to @xmas92 and he mused about swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation:

Thread A:
reserve
lock
NMT booking
unlock
<synchronizing operation (maybe a release_store into a data structure)>

Thread B:
<synchronizing operation (maybe a load_acquire from the same data structure)>
lock
NMT booking
unlock
release

As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock. If another thread (say Thread C) manages to re-reserve the memory, it sounds reasonable that the NMT release booking should have already completed. If we were to try this out, we would have to verify/ensure that the release/reserve pairs perform enough synchronization to make this work. Axel can probably correct me if I mischaracterized what he wrote.

@stefank
Copy link
Member

stefank commented Mar 25, 2025

Hi stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved?

I'm very curious to find out if anyone knows how this could happen without a race condition hand-over from one thread to another. (See my answer to Stüfe).

Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.

I haven't seen that race in the wild, I just noticed that the memory operations weren't protected and thought that it could be a problem.

OK. Let's see if anyone finds a hole in my arguments given above.

@roberttoyonaga
Copy link
Contributor Author

... swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation:
...
As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock.

Hi @stefank, I think that's true. But if the release/uncommit does not complete successfully we would need to readjust the accounting afterward. To do that we would need to retrieve the original memtag (at least for reserved regions) and potentially need to retrieve the original callsite data (if we're in detailed mode).

@stefank
Copy link
Member

stefank commented Mar 25, 2025

... swapping the order of the release and NMT release booking as a way to shrink the lock scope for the release operation:
...
As long as we hold the reservation, no other thread can re-reserve the reservation, so Thread B can take its time to first perform the NMT release booking under the lock, and then perform the release without the lock.

Hi @stefank, I think that's true. But if the release/uncommit does not complete successfully we would need to readjust the accounting afterward. To do that we would need to retrieve the original memtag (at least for reserved regions) and potentially need to retrieve the original callsite data (if we're in detailed mode).

When does a release/uncommit fail? Would that be a JVM bug? What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?

If I look at the man page for munmap it only fails if you pass in incorrect values, which sounds like a JVM bug to me. I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.

@roberttoyonaga
Copy link
Contributor Author

roberttoyonaga commented Mar 26, 2025

Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above?
I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that?

From what I can tell, it doesn't look like that's happening anywhere, someone else may know better though. Similarly, for uncommit, the base address must be passed over from somewhere else in the JVM so relying on some external synchonization seems reasonable here too. If this problem scenario is not present in the current code and it's not expected it to become a possiblity in the future, then I suppose there's no reason to guard against it. Maybe just a comment explaining the reasoning is good enough (and a warning not to use such patterns).


When does a release/uncommit fail? Would that be a JVM bug?

On Windows, VirtualFree also looks like it only fails if an invalid set of arguments are passed. So if os::pd_release fails it's probably a JVM bug. Uncommit uses mmap, which could fail for a larger variety of reasons. Some reasons are out of control of the JVM. For example: "The number of mapped regions would exceed an implementation-defined limit (per process or per system)." See here

What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?

If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.

I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

@stefank
Copy link
Member

stefank commented Mar 26, 2025

Are there any code that we know of that doesn't fit into a synchronization pattern similar to the above?
I can think of some contrived example where Thread B asks the OS for memory mappings and uses that to ascertain that a pre-determined address has been reserved, and how that could lead to an incorrect booking as you described, but do we really have code like that?

From what I can tell, it doesn't look like that's happening anywhere, someone else may know better though. Similarly, for uncommit, the base address must be passed over from somewhere else in the JVM so relying on some external synchonization seems reasonable here too. If this problem scenario is not present in the current code and it's not expected it to become a possiblity in the future, then I suppose there's no reason to guard against it. Maybe just a comment explaining the reasoning is good enough (and a warning not to use such patterns).

When does a release/uncommit fail? Would that be a JVM bug?

On Windows, VirtualFree also looks like it only fails if an invalid set of arguments are passed. So if os::pd_release fails it's probably a JVM bug. Uncommit uses mmap, which could fail for a larger variety of reasons. Some reasons are out of control of the JVM. For example: "The number of mapped regions would exceed an implementation-defined limit (per process or per system)." See here

Right. And that failure is fatal, so there should be no need to fix any NMT bookkeeping for that.

What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?

If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.

Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well.

I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?

@tstuefe
Copy link
Member

tstuefe commented Mar 26, 2025

Hi stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented by the current implementation. Is there a reason that the current implementation only does the wider locking for release/uncommit? Maybe (2.1) (1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another thread would uncommit/release the same base address shortly after it's committed/reserved?

I'm very curious to find out if anyone knows how this could happen without a race condition hand-over from one thread to another. (See my answer to Stüfe).

Stefan, your analysis sounds reasonable. Don't see a hole. The original issue was from me I think, but I've never seen that variant in real life.

So, I am fine with leaving that scenario out.

@tstuefe
Copy link
Member

tstuefe commented Mar 26, 2025

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?

I second @stefank here.

Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though.

I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself.

@stefank
Copy link
Member

stefank commented Mar 26, 2025

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?

I second @stefank here.

Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though.

This happens when we hit the /proc/sys/vm/max_map_count limit, and this immediately crashes the JVM.

I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself.

Is this referring to failures when we hit the max_map_count limit? I'm not convinced that you can recover from that without immediately hitting the same issue somewhere else in the code.

Or maybe you are thinking about some of the other reasons for the uncommit to fail?

@roberttoyonaga
Copy link
Contributor Author

What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?

If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.

Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well.

Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is.

I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?

VirtualSpace::shrink_by allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [1]

@stefank
Copy link
Member

stefank commented Mar 26, 2025

What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?

If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.

Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well.

Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is.

My thinking is that if there is a failure you don't know what state the OS left the memory in. So, you don't know whether the memory was in fact unmapped as requested, or if it was left intact, or even something in-between. So, if you don't do the matching NMT bookkeeping there will be a mismatch between the state of the memory and what has been bookkeeped in NMT.

I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.

I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.

I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?

VirtualSpace::shrink_by allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [1]

The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way.

@JoKern65
Copy link
Contributor

JoKern65 commented Apr 3, 2025

Tonight we tested this PR on AIX and it failed in the gtest with

Internal Error (os_aix.cpp:1917), pid=26476938, tid=258
Error: guarantee((vmi)) failed

This will happen if a
os::pd_commit_memory() or os::pd_release_memory() or os::pd_uncommit_memory()
is called on memory not allocated with
os::pd_reserve_memory() or os::pd_attempt_map_memory_to_file_at() or os::pd_attempt_reserve_memory_at()

@roberttoyonaga
Copy link
Contributor Author

Internal Error (os_aix.cpp:1917), pid=26476938, tid=258 Error: guarantee((vmi)) failed

This will happen if a os::pd_commit_memory() or os::pd_release_memory() or os::pd_uncommit_memory() is called on memory not allocated with os::pd_reserve_memory() or os::pd_attempt_map_memory_to_file_at() or os::pd_attempt_reserve_memory_at()

Thank you for running the tests on AIX. I've excluded the file mapping tests that don't meet that criteria on AIX.

@JoKern65
Copy link
Contributor

JoKern65 commented Apr 7, 2025

I ran the tests over the weekend again and now they passed.

@roberttoyonaga roberttoyonaga requested a review from stefank April 8, 2025 13:10
Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to me, but please seek feedback from others as well.

I've added a couple of suggestions. None of them are required, but I think they would be nice to do.

Comment on lines 2200 to 2206
// The scope of NmtVirtualMemoryLocker covers both pd_uncommit_memory and record_virtual_memory_uncommit because
// these operations must happen atomically to avoid races. For example, if Thread_1 were to uncommit a region, but
// before Thread_1 can update NMT accounting, Thead_2 commits overlapping the same region and updates NMT,
// then Thread_1 finishes updating NMT. This would result in NMT perceiving part of the region being uncommited,
// when it is actually committed. The opposite scenario is not guarded against. pd_commit_memory and
// record_virtual_memory_commit do not happen atomically. We assume that there is some external synchronization
// that prevents a region from being uncommitted before it is finished being committed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a requirement, but you get kudos from me if you keep comments lines below 80 lines. I typically don't like code to be 80 lines, but comments tend to be nicer if they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll try to shorten these comments a bit

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 8, 2025
roberttoyonaga and others added 4 commits April 9, 2025 09:18
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 9, 2025
@roberttoyonaga
Copy link
Contributor Author

Thank you @stefank for the feedback. I've applied your suggestions.

@tstuefe, when you have time, can you please have another look at this? Based on the discussion above, I've reverted the changes to the locking scope in favor of new comments explaining the asymmetrical locking and warning against patterns that lead to races. The new tests that are still relevant are kept.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for doing this. I am fine with this version.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 16, 2025
@roberttoyonaga
Copy link
Contributor Author

Thank you everyone for the review!

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 22, 2025
@openjdk
Copy link

openjdk bot commented Apr 22, 2025

@roberttoyonaga
Your change (at version 7b7263b) is now ready to be sponsored by a Committer.

@tstuefe
Copy link
Member

tstuefe commented Apr 23, 2025

/sponsor

1 similar comment
@tstuefe
Copy link
Member

tstuefe commented Apr 23, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Apr 23, 2025

Going to push as commit 44c5aca.
Since your change was applied there have been 860 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 23, 2025
@openjdk openjdk bot closed this Apr 23, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Apr 23, 2025
@openjdk
Copy link

openjdk bot commented Apr 23, 2025

@tstuefe @roberttoyonaga Pushed as commit 44c5aca.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk
Copy link

openjdk bot commented Apr 23, 2025

@tstuefe The command sponsor can only be used in open pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants