Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace read/write lock in JarResource to avoid virtual threads pinning #40942

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

mariofusco
Copy link
Contributor

This is a proposed solution for the problem reported here.

I'm avoiding blocking on the ReadWriteLock and emulating its behavior with a state machine + an atomic readers counter because the zipFile access may happen inside a native call (for example triggered by the RunnerClassLoader) and then it is necessary to avoid blocking on it.

Note that while avoid blocking and thus preventing pinning, I'm afraid that this solution could suffer of monopolization. I tried to mitigate the problem with the yields at lines 160 and 183 but I'm not sure if this is a good idea, especially with virtual threads. Any feedback or suggestion to improve is welcome.

/cc @franz1981 @geoand @Sanne @dmlloyd @cescoffier

@quarkus-bot quarkus-bot bot added the area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins label Jun 3, 2024
Copy link

quarkus-bot bot commented Jun 3, 2024

Thanks for your pull request!

The title of your pull request does not follow our editorial rules. Could you have a look?

  • title should preferably start with an uppercase character (if it makes sense!)

This message is automatically generated by a bot.

}
Thread.yield();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still blocks, it just doesn't look like blocking (the thread can loop for an arbitrarily long time in this case, keeping the thread pinned the whole time because the pinning native call
is still on the stack). I suspect that the only way around this is to not publish the class loader object at all, until the zip file is actually opened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my doubt too. I can simply remove this statement (and the other yield one on the close), but I'm just afraid that this could make the potential monopolization problem even worse. This implementation however should prevent the deadlock problem reported in the original issue (and that in all honesty I've never been able to reproduce anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah AFAICT that is possible. Or it might turn the deadlock into a livelock, but that seems less likely I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true but the livelock could be especially bad with virtual threads since they are not preemptive. Unfortunately the only way to make them preemptive at that point is performing a blocking operation which is exactly the thing that we want to avoid. In a way or another this is probably still a sub-optimal solution, but I think better than the existing one. I'm not strongly opinionated in keeping or removing the yields, please advice on what I should do about them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to keep them. Better would be to park/unpark but I don't see how that would be feasible, since the one part of the problem that this solves is avoiding a dependency on the owner thread to release waiter thread(s).

This comment has been minimized.

@mariofusco
Copy link
Contributor Author

Failed to execute goal net.revelc.code.formatter:formatter-maven-plugin:2.24.0:validate (default) on project quarkus-bootstrap-runner: File '/home/runner/work/quarkus/quarkus/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/JarResource.java' has not been previously formatted. Please format file (for example by invoking `mvn -f independent-projects/bootstrap/runner net.revelc.code.formatter:formatter-maven-plugin:2.24.0:format`) and commit before running validation!

I tried to format the code using that maven command, but when I run it, it doesn't change anything in the java source (or in any other file). Am I missing something?

@franz1981
Copy link
Contributor

franz1981 commented Jun 3, 2024

I would make a separate RefCntResourceFile class which contains a volatile field pointing to the file to create and an atomic ref count to handle ownership and eventually the close (the release which put the ref can't back to zero got the rights to close the file)

In the JarResource I would have an AtomicReference<RefCntResource> which can be in the states null <-> not null, but thanks to RefCntResource being a new instance on each opening transition, would avoid ABA issues.

The algorithm can be made more cooperative and progressive by allowing contended operations to "help" eg concurrent opens can compete to populate the volatile field in the currently set RefCntResourceFile, avoiding to spin wait, while allocating more memory.
Similarly a close doesn't need to wait till all readers are gone but can perform a RefCntResourceFile::release which if true will make it to attempt moving to null the shared AtomicReference<RefCntResource>, while it false, will likely be picked by the reader while finishing to read (each read must be surrounded by a retain and release on the current RefCntResource).

@geoand
Copy link
Contributor

geoand commented Jun 4, 2024

Failed to execute goal net.revelc.code.formatter:formatter-maven-plugin:2.24.0:validate (default) on project quarkus-bootstrap-runner: File '/home/runner/work/quarkus/quarkus/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/JarResource.java' has not been previously formatted. Please format file (for example by invoking `mvn -f independent-projects/bootstrap/runner net.revelc.code.formatter:formatter-maven-plugin:2.24.0:format`) and commit before running validation!

I tried to format the code using that maven command, but when I run it, it doesn't change anything in the java source (or in any other file). Am I missing something?

https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/source.20code.20not.20formatted seems to be relevant

}
zipFileReadersCounter.incrementAndGet();
Copy link
Contributor

@franz1981 franz1981 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a close can happen (and a subsequent open too!!!) which means that we can have the zip file in 2 possible wrong states:

  • already closed
  • a different one(!)

I suggest to move the ref cnt into a atomic ref < ref cnt obj > which keep the 2 states dependent and tied together

@mariofusco
Copy link
Contributor Author

I reworked this pull request following @franz1981's suggestions. Please review again.

Copy link
Member

@dmlloyd dmlloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a minor comment.


private final AtomicInteger readersCounter = new AtomicInteger(0);

private volatile boolean releasing = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this: you need to use compareAndSet loops on both retain/release to make sure that you cannot "resurrect" a released resource

eg

  • ref count start at 1 i.e. alive
  • a release put it to 0 (and could have rights to close the file)
  • an acquire just find it at 0, returning false because the resource is already released

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this boolean for a twofold reason:

  1. checking that a release hasn't been already requested when readLockAcquire is called because in that case I cannot return a reference to the jarFile that could have been closed in the meanwhile.
  2. checking if a release has been already requested when readLockRelease is called: in this case if the readers counter reaches to 0 I'm actually allowed to close the jarFile.

I don't think that I can reuse the readersCounter for this same purpose, for instance setting it to 0 when a release is requested, because doing so I would lose track of how many readers are still currently accessing the jarFile. Am I missing something?

Copy link
Contributor

@franz1981 franz1981 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still introducing a level of freedom which can break things: we should always relies on a single source of truth related the state of the file ownership; and the math which bring 1 -> 0 as a final state is very easy to enforce via compareAndSet loops.
The boolean hint could be re-implemented with an isReleased() { return rf.cnt() == 0; } still granting a nice external API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boolean hint could be re-implemented with an isReleased() { return rf.cnt() == 0; } still granting a nice external API.

The 2 things are unrelated though: the fact that the counter is zero (there are no current readers) doesn't imply in any way that somebody call a close on the JarResource thus actually asking a release of the jarFile.

I understand that ideally it would be much better to have the state of the JarFileReference, but still I don't see how to achieve it in this case. Moreover I tried to (mentally) double check all the possible interactions and situations and I believe that it shouldn't be any gap. Please review it if you have time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the happens before relation twice I cannot see evident problems (so this reduce my concerns), but i personally prefer to reduce the level of freedoms to make more easy to reason of things, but is personal, really.

// and then it is necessary to avoid blocking on it.
private final JarFile jarFile;

private final AtomicInteger readersCounter = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses an AtomicIntegerFieldUpdater with a volatile field to reduce the footprint of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This obviously makes sense, but I don't think it will make much of a difference since you are not likely to have many more than 100 jar files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @geoand that this will introduce further complexity (and also the need of accessing the field through reflection) for a very limited memory gain, but I'm open to change this if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same trick also applies here, that you can use AtomicInteger() to make it slightly faster

// To enable virtual threads compatibility and avoid pinning it is not possible to use an explicit read/write lock
// because the jarFile access may happen inside a native call (for example triggered by the RunnerClassLoader)
// and then it is necessary to avoid blocking on it.
private final JarFile jarFile;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better a volatile field with a AtomicReferenceFiledUpdater to allow concurrent threads to "help" (or not - is up to the algorithm): it means that a reader can successfully create and set and new JarFileReference yet empty, because in the process of being read from the disk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the same logic that you're suggesting has been already implement in the JarResource::jarFileRef method here. I think that it is more readable/maintainable if this JarFileReference keeps a stable and final reference to the JarFile it is referring to.

//To close the jar, the exclusive lock must be owned, and reference will be set to null before releasing it.
//Likewise, opening a JarFile requires the exclusive lock.
private volatile JarFile zipFile;
private final AtomicReference<JarFileReference> jarFileReference = new AtomicReference<>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franz1981 not that it makes any difference here, but it isn't a tiny bit faster to do new AtomicReference<>()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for suggestion.

@@ -69,13 +62,22 @@ public void init() {

@Override
public byte[] getResourceData(String resource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can become a method on JarFileReference, actually, hiding the ownershipt logic within it.
It allows reading a reasouce (with its own retain/release pairs, to be able to close the file, if the release grant rights to do it...)

In order to achieve it we need to pass the AtomicReference to this method so it can "optmistiacally" attempts a compareAndSet(this, null) to it.
If it fails is not a problem, because it means another thread has succeeded to create a new one OR nulling it out

@mariofusco
Copy link
Contributor Author

mariofusco commented Jun 4, 2024

@franz1981 I tried to follow all your suggestions and I believe I reached a far better level of encapsulation in the JarFileReference with the introduction of a non-capturing lambda that has the purpose of consuming the jarFile hiding all the complexities of retrieving a valid non closed instance and keeping track of the readers counting. The only thing that I haven't been able to remove is the need of that boolean releasing and I tried to explain why I believe that it is still necessary. Please review again 🙏

@geoand
Copy link
Contributor

geoand commented Jun 4, 2024

Also, when all else is done, we'll need to squash the commits

readersCounter.incrementAndGet();
if (releasing) {
if (readersCounter.decrementAndGet() == 0) {
close();
Copy link
Contributor

@franz1981 franz1981 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the part which can break:

  • T1 on release: set release = true and arrive till readersCounter.get() == 0 check, not yet happening
  • T2 has been able to read releasing == true and has decremented the read counter till 0
  • T1 now check readersCounter.get() == 0 as successfull too
  • T2 and T1 both compete to perform the close

Copy link
Contributor Author

@mariofusco mariofusco Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true and I also found this possibility, anyway this shouldn't be a problem: at worst 2 threads will both call close() which however is idempotent.

Copy link
Contributor

@franz1981 franz1981 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, the javadoc doesn't mention the concurrent behaviour of such method and I see that the zip file close will issue a close to https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/zip/ZipFile.java#L457 which seems thread-safe but only because the implementation looks like that, undocumented.
Said that I believe the state machine could be simplified here; I have proposed a possible way in a slack chat, reusing the same logic Netty uses for its reference counted objects; let's see if it helps to simplify the logic, removing racing states.

@dmlloyd too
Furthermore, I am a bit surprised to find synchronized blocks there...and it is likely there are others we are using that we haven't yet be warned as pinning ones, but will likely do :"(

e.g. https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/zip/ZipFile.java#L348 and https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/zip/ZipFile.java#L385

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling close multiple times happens all the time at the HTTP layer for example and it's not a problem because as Mario mentions, it's idempotent.

All that to say that I don't believe this is an issue

Copy link
Contributor

@franz1981 franz1981 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but the "problem" i see is that is not a degree of freedom which give use any practical advantage and still force to have 2 separate variables which have happens-before relationship between them, making the logic more complex to maintain and document (but as said, that's my brain...so it's personal).
I still suggest to reduce further the mutability surface and states in order to make it easier to maintain..

* accessible.
*/
private JarFile readLockAcquire() {
readersCounter.incrementAndGet();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I'm preventively increasing the counter here is that this ensure that the counter is at least 1 before checking the releasing variable thus preventing the release() method to actually close the jarFile. Then if (even in the meanwhile) somebody called the release, I check it here: in this case I cannot return the jarFile but I decrement again the counter that I increased just before, thus preventing it to go overboard, and if it reaches to 0 I also close the jarFile.

@mariofusco
Copy link
Contributor Author

@franz1981 I think I've finally understood what you meant (I'm getting old) and implemented the reference counter accordingly. Please review again.

@gsmet
Copy link
Member

gsmet commented Jun 5, 2024

Given the amount of work and discussions related to this issue, could we have a simple revert PR targeting 3.11?

Whatever gets into 3.12 wouldn't be backportable.

Thanks.

}
if (count == 0) {
try {
jarFile.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pretend the compare and set before closing this and don't only if the previous value was 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@franz1981 franz1981 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use compare and set because 2 close can make the counter to go less then 0 breaking the acquire which return null

We want the close to still keep the counter correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change this, but please give another look, since I'm not 100% sure and feel free to ping me in case you have any question or doubt.

@geoand
Copy link
Contributor

geoand commented Jun 5, 2024

Given the amount of work and discussions related to this issue, could we have a simple revert PR targeting 3.11?

+1

@mariofusco
Copy link
Contributor Author

Given the amount of work and discussions related to this issue, could we have a simple revert PR targeting 3.11?

Whatever gets into 3.12 wouldn't be backportable.

Do you mean we will have a 3.11.1 patch release where we simply revert the use of registerAsParallelCapable() on the RunnerClassLoader while we will have this fix (when it will be ready, we should be almost there) on the 3.12.0? If so, that's ok for me.

@geoand
Copy link
Contributor

geoand commented Jun 5, 2024

Yeah, @gsmet is proposing reverting #40033 in the 3.11 branch - which would be done by opening a PR against that branch specifically

@geoand
Copy link
Contributor

geoand commented Jun 5, 2024

Also, when all else is done, we'll need to squash the commits

Bumping this so it does not get lost :)

@mariofusco
Copy link
Contributor Author

Yeah, @gsmet is proposing reverting #40033 in the 3.11 branch - which would be done by opening a PR against that branch specifically

For what I have experimented these days, I don't think that a full revert will be necessary, but it would be sufficient to just remove the statement calling registerAsParallelCapable() on the RunnerClassLoader, anyway up to you.

Also, when all else is done, we'll need to squash the commits

Bumping this so it does not get lost :)

That's ok, I didn't forget it, I just wanted to finish discussing all the details with @franz1981 before squashing :)

@franz1981
Copy link
Contributor

franz1981 commented Jun 26, 2024

In the comment you pasted I can see a reference to "main" without a commit ID and I don't know if it is related the parallel or not version...
Furthermore on
https://github.com/quarkusio/quarkus/pull/40942/files#diff-a8ea1b8267b00e0c29800f9e3d324797e9b62667b590fe3c3c8ad81f708f44f4L179

I can see that you modified the order of eviction of jars which give some additional benefit on this pr compared to master which doesn't contain it - and this could be a separate PR/issue per se really.

In short I would like to see how this version compare against:

To understand the isolated impacts of the changes here

//To close the jar, the exclusive lock must be owned, and reference will be set to null before releasing it.
//Likewise, opening a JarFile requires the exclusive lock.
private volatile JarFile zipFile;
final AtomicReference<JarFileReference> jarFileReference = new AtomicReference<>();
Copy link
Contributor

@franz1981 franz1981 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a single state here. The completable future is good for both cases because allow to be created already completed or can be shared completing it only if we win the race to share it, making easier to follow what's going on.
In the case of virtual threads, it means that virtual threads could spin while waiting for it (because it will be eventually populated although it can later fail to acquire it) - or, in the case of platform threads, just wait.
The waiting part requires some attention for interruption, I think, but is a separate problem; i.e. if who is supposed to populate fail because of I/O or interruption (or OOM), we shouldn't let the others to wait for nothing, unblocking them, by completing it with the thrown exception or some other mechanism.

@mariofusco
Copy link
Contributor Author

@franz1981 I implemented your latest suggestion on unifying all the concurrent state in one single AtomicReference with this last commit. I'm pretty happy with the algorithm now, but it would be great if you could give a general look.

Regarding the performance I reran the StartStop fullmicroprofile benchmark against main obtaining these results:

AVG startedInMs without min and max values: 1083
AVG timeToFirstOKRequest without min and max values: 1252
AVG rssKb without min and max values: 204890

while running it on this pull request I got:

AVG startedInMs without min and max values: 1096
AVG timeToFirstOKRequest without min and max values: 1277
AVG rssKb without min and max values: 205312

So it seems that this pull request doesn't introduce any relevant penalty in terms of performance and memory occupation, while implementing a non-blocking parallelism that is compatible with virtual threads. Regarding them I also ran the benchmark forcing the isVirtualThread() method to always return true. By doing so I'm still using platform thread but forcing them to use the algorithm designed for the virtual ones, both to test again its correctness and check its performances. In this case I got:

AVG startedInMs without min and max values: 1063
AVG timeToFirstOKRequest without min and max values: 1238
AVG rssKb without min and max values: 223113

where the (expected) slightly higher memory occupation is very likely due to the fact that in this case the algorithm is allowed to eventually load the same jar in parallel from multiple threads.

writeLock.unlock();
var ref = jarFileReference.get();
if (ref != null) {
ref.join().close(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory join shouldn't wait here: you could use ref.getNow() and adding an assert that you always expect some value here, because the file should be already populated due to the ref count logic (which should close only if there are no acquisition pending)

// Happy path: the jar reference already exists and it's ready to be used
final var localJarFileRefFuture = jarResource.jarFileReference.get();
if (localJarFileRefFuture != null) {
if (localJarFileRefFuture.join().acquire()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to await till the jar is loaded - or we assume it to always be populated here?
If is it the case we should use getNow and put an assert that we assume is going to always be true

}

// Virtual threads needs to load the jarfile synchronously to avoid blocking. This means that eventually
// multiple threads could load the same jarfile in parallel and this duplication has to be reconciled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we are not busy spinning till the load complete, for virtual thread, but waste some RSS (in case of parallel class load over the same jar): I hope we're all in sync that's the choice @Sanne @geoand @dmlloyd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK for now; hopefully this is all temporary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

private static CompletableFuture<JarFileReference> getJarFileReferenceFuture(JarResource jarResource) {
CompletableFuture future = new CompletableFuture();
if (jarResource.jarFileReference.compareAndSet(null, future)) {
future.completeAsync(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not "complete" here?

Copy link
Contributor

@franz1981 franz1981 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

this is going to use the "common FJ pool" I don't think is what we want here, especially because this one is lazily allocated and usually people access to it via parallel on the Stream API, which is not that common.
The fact that it didn't popup in the RSS data is puzzling, because FJ thread should impact RSS due to the dedicated OS space for the stack size (which is 512K per thread by default? I don't remember by heart...)

@mariofusco
Copy link
Contributor Author

@franz1981 I fixed all your suggestions with my latest commit, thanks for your feedback, please give another look. I also added a few comments to further clarify the code and ran one more time the usual benchmark without noticing any relevant difference.

This comment has been minimized.

@mariofusco
Copy link
Contributor Author

@franz1981 @geoand @dmlloyd @Sanne As agreed I rebased this pull request against the main branch, reenabled the parallel capabilities of the RunnerClassLoader and squashed all commits. In case we're all happy with this I think that this is ready to be merged now, otherwise feel free to review again and provide further feedback. Considering also what discussed during yesterday's Loom meeting I believe that bringing this change in (or something equivalent) is unavoidable given the current state of virtual thread implementation on the JVM.

@mariofusco
Copy link
Contributor Author

Any news on this?

@geoand
Copy link
Contributor

geoand commented Jul 8, 2024

I'll leave it to @franz1981 since he is the one that had the latest comments

// else, we drop one element from the cache,
// and inserting the latest resource on the tail:
toEvict = currentlyBufferedResources[currentlyBufferedResources.length - 1];
currentlyBufferedResources[currentlyBufferedResources.length - 1] = resource;
for (int j = currentlyBufferedResources.length - 1; j > 0; j--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is the same as the "bubble up" case, just with a different starting point: you could create a method out of it to make it easier to remember what it does - or changing it in a single place.

CompletableFuture future = new CompletableFuture();
if (jarResource.jarFileReference.compareAndSet(null, future)) {
try {
future.complete(new JarFileReference(JarFiles.create(jarResource.jarPath.toFile()), false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not creating this as already acquired? we want to make sure that if the thread which win the race to move the state from null -> future has its own chance to read it or we risk that another thread will acquire it as soon as it is published, release it and making it to close.

CompletableFuture<JarFileReference> newJarFileRef;
do {
final var jarFileRefFuture = getJarFileReferenceFuture(jarResource);
newJarFileRef = jarFileRefFuture == null ? jarResource.jarFileReference.get() : jarFileRefFuture;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newJarFileRef can be null if jarResource.jarFileReference.get return null?

This can happen if there is a racy cleanup of the current held future to a closed jar file

@franz1981
Copy link
Contributor

I've added a couple of comment where one is related a potential NPE (apparent to my monday brain) and another is to make it more progressive (who perform some work has to be rewarded by using its own resource at least once)
: said that overall look good, well done!

@mariofusco
Copy link
Contributor Author

@franz1981 I reworked this pull request following your latest suggestions (thanks again for this), please review it again when you have time.

Copy link
Contributor

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done @mariofusco !!

@geoand geoand added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jul 8, 2024

This comment has been minimized.

Copy link

quarkus-bot bot commented Jul 9, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit ff4b32b.

Failing Jobs

Status Name Step Failures Logs Raw logs Build scan
✔️ JVM Tests - JDK 17 Logs Raw logs 🚧
✔️ JVM Tests - JDK 21 Logs Raw logs 🚧
✔️ JVM Tests - JDK 17 Windows Failures Logs Raw logs 🚧

Full information is available in the Build summary check run.

Failures

⚙️ JVM Tests - JDK 17 Windows #

- Failing: extensions/amazon-lambda/deployment 
! Skipped: extensions/amazon-lambda-http/deployment extensions/amazon-lambda-rest/deployment integration-tests/amazon-lambda and 9 more

📦 extensions/amazon-lambda/deployment

io.quarkus.amazon.lambda.deployment.testing.InputCollectionOutputCollectionLambdaTest.requestHandler_InputCollectionInputPerson_OutputCollectionOutputPerson line 40 - History - More details - Source on GitHub

java.lang.AssertionError: 
1 expectation failed.
JSON path  doesn't match.
Expected: a collection containing map containing ["outputname"->"Chris"]
  Actual: <[{outputname=Fred}]>

	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)

Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 extensions/smallrye-reactive-messaging/deployment

io.quarkus.smallrye.reactivemessaging.hotreload.ConnectorChangeTest.testUpdatingConnector - History

  • Expecting actual: ["-4","-5","-6","-7","-8","-9","-10","-11"] to start with: ["-3", "-4", "-5", "-6"] - java.lang.AssertionError
java.lang.AssertionError: 

Expecting actual:
  ["-4","-5","-6","-7","-8","-9","-10","-11"]
to start with:
  ["-3", "-4", "-5", "-6"]

	at io.quarkus.smallrye.reactivemessaging.hotreload.ConnectorChangeTest.testUpdatingConnector(ConnectorChangeTest.java:36)

@geoand
Copy link
Contributor

geoand commented Jul 9, 2024

@dmlloyd any objections to merging?

@dmlloyd
Copy link
Member

dmlloyd commented Jul 9, 2024

No objections here.

@geoand
Copy link
Contributor

geoand commented Jul 9, 2024

👍🏼

Let's get this in then and keep a close eye on any suspicious failures from now on

@geoand
Copy link
Contributor

geoand commented Jul 9, 2024

Thanks a lot for this work @mariofusco!!!

@geoand geoand merged commit 014bb15 into quarkusio:main Jul 9, 2024
53 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.13 - main milestone Jul 9, 2024
@quarkus-bot quarkus-bot bot removed the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jul 9, 2024
@michalvavrik
Copy link
Member

michalvavrik commented Jul 23, 2024

FYI #42067

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins triage/flaky-test
Projects
Development

Successfully merging this pull request may close these issues.

8 participants