-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8344332: (bf) Migrate DirectByteBuffer to use java.lang.ref.Cleaner #22165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
@shipilev This change is no longer ready for integration - check the PR body for details. |
81f3edc
to
c546199
Compare
773ebdb
to
80f64d0
Compare
80f64d0
to
1326e5e
Compare
@shipilev this pull request can not be integrated into git checkout JDK-8344332-dbb-cleaner
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
1326e5e
to
9528a01
Compare
@shipilev This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
9528a01
to
dd25e5a
Compare
Webrevs
|
FTR, wider testing shows two tests that install custom system classloader fail with the following circularity:
This can be avoided if we use the lazy holder pattern to delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection to move this to use jl.ref.Cleaner. Note that the long standing recommendation has always been to cache/reuse direct buffers rather than discard like the reproducer does. Now that the FFM API is permanent then we should redirect developers looking for deterministic release to use the new API and they can get a BB view if needed.
I assume you'll update the copyright header before this change is integrated.
src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
Outdated
Show resolved
Hide resolved
src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
Outdated
Show resolved
Hide resolved
This looks fine for me. Older Lucene versions won't break, as they use Code that deep reflected and made I'd remove the |
Did the change to |
It was a test bug, I fixed it separately: JDK-8348301. I merged current master to this PR branch now, so it should work well. |
I ran it through our CI and we seem to have a few tests that aren't in the jdk repo but have a dependency on jdk.internal.ref.Cleaner so they will need to be fixed and/or excluded. Would you mind holding off until Tuesday to give time to get that done? Otherwise I didn't see any issues. I see there is a referenced Graal issue as it looks like they have to do an update to align with the changes here. |
Sure, there is no rush with this PR. I'll wait. |
How is it going, @AlanBateman? Any progress on closed tests? |
Nearly done - in review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for restoring waitForReferenceProcessing()
.
Closed test changes are in, so you should be good to go.
Thanks for giving us time to get this sorted out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several changes here that I think are of interest.
(1) Put the cleanables for DirectByteBuffer in a separate j.l.r.Cleaner. I
don't think that was done in previous experiments with eliminating the use of
j.i.r.Cleaner. That seems like a good thing to do, to avoid interleaving the
processing of these short and potentially more performance-critical cleanables
with others.
(2) Do a GC for each iteraton of the slow path (i.e. after each exponentially
increasing sleep period). Neither the existing code (which I had a hand in)
nor its predecessor did that. I'm currently doubtful of this being a good
idea. The exponential backoff in the current code was just retained from the
prior code, and I think doesn't actually do much other than delay OOME to
allow other threads to happen to close any direct buffers they might be using.
In the current code (using j.i.r.Cleaner), once waitForReference returns false
all Cleaners discovered by a prior GC will have been processed. I wonder,
with the current code, under what circumstances do we actually get into those
sleeps, but don't ultimately OOME?
To provide something similar with j.l.r.Cleaner would require an operation
on that class similar to waitForReferenceProcessing. If there is pending
cleanup work then wait for some and then return true. If there isn't any
pending cleanup work then return false. I've spent some time thinking about
how this could be done, and think it's feasible (working on a prototype that
seems like it should work, but hasn't been tested at all yet), though not as
simple as one might wish.
(3) Put the slow path under a lock. That seems clearly essential with the
change to iterate GCs. It might be that it's a mistake in the old code to not
do something similar. The current code allows essentially back to back to
back GCs as multiple threads hit the slow path more or less simultaneously.
You could have several threads come in and all waitForReferenceProcessing, all
finish that loop at the same time and gang-request GCs. That doesn't seem
good.
(4) Add the canary mechanism. I think this is a poor replacement for what
waitForReferenceProcessing does currently, since it is not even remotely
reliable. It might make the path to OOME less likely, but bad luck could make
it do nothing useful at all.
Thanks for the comment, @kimbarrett! I started replying, but then figured my reply on current canary mechanics would be better captured in the code comment, so I pushed it right there. Then I ran out of time. I would think about this a bit more and try to reply to the your comments sometime later this week. So far, I still believe Cleaner canaries are the best way to balance the probability of success for the best effort attempt to allocate when short on memory and the implementation simplicity. Maybe filling the only theoretical gap I see at the moment (see code comments) would be marginally better with waiting for What I am sure, though, is that lots of theoretical arguments I made to myself, including re-using waiting infrastructure, failed empirical reliability/performance tests after my attempts at implementing them. Admittedly, I have not tried to go as far as coordinating both reference processing and Cleaner threads, maybe, maybe the answer is there. So I would reserve final judgement on whether GC+RefProc+Cleaner waiting approaches really work until I see them actually work :) In contrast, current PR both fits my theoretical understanding why it works, and passes the empirical tests. Looking forward to your attempt! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a prototype that adds Cleaner.waitForCleaning() and changes
Bits.reserveMemory() to use it.
https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:wait-for-cleaning?expand=1
It's based on the shipilev commits 1-19, for ease of comparison and
prototyping. There are also some shortcuts (like making some things public
that likely shouldn't be), again for ease of prototyping.
I ran the DirectBufferAllocTest test a few hundred times without any
problems. I also ran it directly, configured with a 1/2 hour runtime, again
without problems. I ran the two new micros against baseline, the shipilev
changes only, and the waitForCleaning changes. Results below.
DirectByteBufferGC
baseline
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.036 ± 0.244 ms/op
DirectByteBufferGC.test 65536 avgt 9 7.117 ± 0.184 ms/op
DirectByteBufferGC.test 262144 avgt 9 18.482 ± 1.064 ms/op
DirectByteBufferGC.test 1048576 avgt 9 194.590 ± 17.639 ms/op
DirectByteBufferGC.test 4194304 avgt 9 802.354 ± 57.576 ms/op
shipilev
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.104 ± 0.045 ms/op
DirectByteBufferGC.test 65536 avgt 9 6.875 ± 0.244 ms/op
DirectByteBufferGC.test 262144 avgt 9 17.943 ± 0.446 ms/op
DirectByteBufferGC.test 1048576 avgt 9 59.002 ± 3.616 ms/op
DirectByteBufferGC.test 4194304 avgt 9 331.328 ± 36.580 ms/op
waitForCleaning
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.058 ± 0.167 ms/op
DirectByteBufferGC.test 65536 avgt 9 6.775 ± 0.281 ms/op
DirectByteBufferGC.test 262144 avgt 9 17.724 ± 0.602 ms/op
DirectByteBufferGC.test 1048576 avgt 9 57.284 ± 2.679 ms/op
DirectByteBufferGC.test 4194304 avgt 9 330.002 ± 44.200 ms/op
DirectByteBufferChurn
baseline
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 15.693 ± 0.991 ns/op
DirectByteBufferChurn.test 256 avgt 9 11.255 ± 0.369 ns/op
DirectByteBufferChurn.test 512 avgt 9 9.422 ± 0.382 ns/op
DirectByteBufferChurn.test 1024 avgt 9 8.529 ± 0.311 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.833 ± 0.287 ns/op
shipilev
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 12.524 ± 0.476 ns/op
DirectByteBufferChurn.test 256 avgt 9 9.248 ± 0.175 ns/op
DirectByteBufferChurn.test 512 avgt 9 8.583 ± 0.197 ns/op
DirectByteBufferChurn.test 1024 avgt 9 8.227 ± 0.274 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.526 ± 0.394 ns/op
waitForCleaning
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 11.235 ± 1.998 ns/op
DirectByteBufferChurn.test 256 avgt 9 9.129 ± 0.324 ns/op
DirectByteBufferChurn.test 512 avgt 9 8.446 ± 0.115 ns/op
DirectByteBufferChurn.test 1024 avgt 9 7.786 ± 0.425 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.477 ± 0.150 ns/op
The baseline DirectByteBufferGC suffers significantly at higher counts,
probably because of the long doubly-linked list of registered
jdk.internal.ref.Cleaner objects.
By eye the performance of both shipilev and waitForCleaning on the DBBChurn
benchmark seem to be consistently slightly better than baseline, and within
the noise of each other for both benchmarks.
This waitForCleaning approach raises several questions.
(1) Is the improved reliability of this approach worth the additional effort?
My inclination is to say yes. But note that it should perhaps be different
effort - see item (2).
(2) Should Cleaner be responsible for providing the additional functionality?
An alternative is to have Bits implement it's own cleaning, directly using a
private PhantomReference derivative, ReferenceQueue, and Thread.
If there were multiple (potential) clients for this sort of thing then putting
it in Cleaner may makes sense, so long as their requirements are similar. But
we're kind of fighting with it, esp. with the canary approach, but even with
the waitForCleaning approach. I think things could be simpler and easier to
understand with a direct implementation. And Cleaner was not intended to be
the last word in the area of reference-based cleanup. Rather, it's a
convenient package for addressing some common patterns (esp. with
finalization). It provides some useful infrastructure that would need to be
replaced in a bespoke implementation, but I don't think there's anything all
that large or complicated there, and some streamlining is possible.
Note that I haven't done any work in that direction yet.
(3) Do we need the retry stuff, and how should it be done? I did some
monitoring of the retries, and waitForCleaning DBBA does need the retries
(though rarely more than 1, and I never saw more than 2). I kind of feel like
the retries are a kludgy workaround for what seems like an unreasonable test,
but oh well.
The point of the retries is to allow other threads to drop some DBBs that can
become reclaimable during the associated wait, giving the waiting thread an
opportunity to make use of that now available memory.
I experimented with a retry-style more similar to the current code, where each
thread on the slow path will do at most one GC and then go into a wait/sleep
loop if reservation continues to fail. (I added some locking to avoid the
concurrent requests for GC issue with the current code.) That didn't seem to
make any difference compared to something like the proposed serialized GC and
sleep loop. I can construct theoretical cases that seem to favor either, but
the style in the proposal seems simpler.
Of course, the question of when to do GCs and how many to do is moot if
-XX:+DisableExplicitGC is used. (The default for that option is false.)
// To do this efficiently, we need to wait for several activities to run: | ||
// 1. GC needs to discover dead references and hand them over to Reference | ||
// processing thread. This activity can be asynchronous and can complete after | ||
// we unblock from System.gc(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding cleared references to the pending list is always completed before the
GC invocation completes. Doing otherwise would break or significantly
complicate Reference.waitForReferenceProcessing(), and to no good purpose.
That function should only return false when all references cleared by a prior
GC have been enqueued in their respective queues. There are tests that depend
on that. (I looked, and so far as I can tell, none of the extant GCs violate
this.)
// Instead, we are checking directly if Cleaner have acted on since our last System.gc(): | ||
// install the canary, call System.gc(), wait for canary to get processed (dead). This | ||
// signals that since our last call to System.gc(), steps (1) and (2) have finished, and | ||
// step (3) is currently in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canary having been processed doesn't tell us anything definitive about
step (2), because steps (2) and (3) occur concurrently. The reference
processing thread transfers references to their respective queues, and the
cleaner thread processes references that have been placed in its queue, both
running at the same time. All we know is that the canary got transferred and
cleaned. There may or many not have been other references similarly
transferred and cleaned. There may or may not be more references in the
pending list, in the cleaner queue, or both. That the canary has been cleaned
doesn't give us any information about either the pending list or the cleaner
queue.
// it is possible that canary gets dead before the rest of the buffers get cleaned. This | ||
// corner case would be handled with a normal retry attempt, after trying to allocate. | ||
// If allocation succeeds even after partial cleanup, we are done. If it does not, we get | ||
// to try again, this time reliably getting the results of the first cleanup run. Not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying again, all we know is that both the previous and the new canary
have been processed. We don't know anything about other references, either
from the latest or preceeding GCs. Consider this unfortunate scenario. The
first canary ended up at the head of the pending list. The reference
processing thread transfered it to the cleaner queue and then stalled. The
cleaner thread processed the first canary. The waiting thread noted that,
retried and failed the reservation, and retried the GC and wait for the
canary. The retry canary also happened to end up at the front of the updated
pending list. The reference processor transferred it and again stalled. The
cleaner thread processed the retry canary. No real cleaning has happened,
i.e. we have not reliably gotten the results of the first cleanup run.
And here is a bug and recently opened PR that would benefit from having |
If you are suggesting exposing this as a public API then I don't think we should do this, I actually think it's time to consider deprecating Cleaner but that is a bigger discussion. |
Not at this time, quite possibly never. As mentioned, I made some things public for ease of prototyping.
Which "Cleaner" are you referring to? If jdk.internal.ref.Cleaner, this PR proposes to remove it. |
These caught my eye during my read-through:
I wonder how these together could affect performance of less specialized Cleaner usage cases (where objects are not being enqueued very often). I'm therefore inclined for changes along these lines to be limited to a DirectByteBuffer-specific mechanism, as you suggested in your point (2). |
@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
DirectByteBuffers are still using old
jdk.internal.ref.Cleaner
implementation. That implementation carries a doubly-linked list, and so makes DBB suffer from the same issue fixed for genericjava.lang.ref.Cleaner
users with JDK-8343704. See the bug for the reproducer.We can migrate DBBs to use
java.lang.ref.Cleaner
.There are two pecularities during this rewrite.
First, the old ad-hoc
Cleaner
implementation used to exit the VM when cleaning action failed. I presume it was to avoid memory leak / accidental reuse of the buffer. I moved the relevant block toDeallocator
directly. Unfortunately, I cannot easily test it.Second is quite a bit hairy. Old DBB cleaning code was hooked straight into
Reference
processing loop. This was possible because we could infer that the weak references we are processing were DBB cleaning actions, since oldCleaner
was the only use of this code. With standardCleaner
, we have lost this association, and so we cannot really do this from the reference processing loop. With the patched version, we now rely on normalCleaner
thread to do cleanups for us.Because of this, there is a new outpacing opportunity window where reference processing might have been over, but cleaner thread has not reacted yet. This is why we need another way to check progress that involves checking if cleaner has acted.
Additional testing:
java/nio java/io
java/nio java/io
all
all
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22165/head:pull/22165
$ git checkout pull/22165
Update a local copy of the PR:
$ git checkout pull/22165
$ git pull https://git.openjdk.org/jdk.git pull/22165/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 22165
View PR using the GUI difftool:
$ git pr show -t 22165
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22165.diff
Using Webrev
Link to Webrev Comment