Skip to content

Possible race condition in build process/bootstrap package #41767

Open

Description

Describe the bug

Originally reported as a Keycloak bug: keycloak/keycloak#29758
Keycloak 24.0.5 comes with pre-packaged with Quarkus 3.8.4. On our systems, in the initial build step when Keycloak starts, it chokes on a bit of NFS detritus and fails:
ERROR: io.quarkus.builder.BuildException: Build failure: Build failed due to errors [error]: Build step io.quarkus.deployment.pkg.steps.JarResultBuildStep#buildRunnerJar threw an exception: java.lang.IllegalStateException: java.nio.file.FileSystemException: /nfs/home/nat/keycloak/keycloak-24.0.4/lib/quarkus/.nfs804a7a64040362c3000c81f2: Device or resource busy

The detritus files do not remain around after the JVM exits - they're definitely created by the Keycloak app. The apparent failure point is here: https://github.com/quarkusio/quarkus/blob/3.8.4/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/bootstrap/util/IoUtils.java#L109-L117

I tried debugging the problem with a modified build of quarkus-bootstrap-app-model-3.8.4.jar, and determined that the detritus files do not appear until after Files.delete(p) is called. My suspicion is that these calls inside a DirectoryStream iteration loop result in a race condition on some NFS volumes (i.e. the directory listing is being updated at the same time as it's being read). I was able to fix it by changing a single line to avoid these detritus files:

diff --git a/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/bootstrap/util/IoUtils.java b/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/bootstrap/util/IoUtils.java
index ba28f56..7a77f2c 100644
--- a/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/bootstrap/util/IoUtils.java
+++ b/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/bootstrap/util/IoUtils.java
@@ -110,7 +110,7 @@ public static void createOrEmptyDir(Path dir) throws IOException {
             for (Path p : stream) {
                 if (Files.isDirectory(p)) {
                     recursiveDelete(p);
-                } else {
+                } else if (!p.getFileName().toString().startsWith(".nfs")) {
                     Files.delete(p);
                 }
             }

However I am not a Java developer under normal circumstances so I don't have strong feelings about what the correct approach is here. (I'm happy to submit a pull request with my change however.)

Expected behavior

Build is able to proceed without NFS conflicts.

Actual behavior

ERROR: Failed to run 'build' command.
ERROR: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
        [error]: Build step io.quarkus.deployment.pkg.steps.JarResultBuildStep#buildRunnerJar threw an exception: java.lang.IllegalStateException: java.nio.file.FileSystemException: /nfs/home/nat/keycloak-24.0.5/lib/quarkus/.nfs804a8212004d86bb000af9da: Device or resource busy
        at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:858)
        at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
        at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
        at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
        at java.base/java.lang.Thread.run(Thread.java:842)
        at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: java.nio.file.FileSystemException: /nfs/home/nat/keycloak-24.0.5/lib/quarkus/.nfs804a8212004d86bb000af9da: Device or resource busy
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:248)
        at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
        at java.base/java.nio.file.Files.delete(Files.java:1152)
        at io.quarkus.bootstrap.util.IoUtils.createOrEmptyDir(IoUtils.java:114)
        at io.quarkus.deployment.pkg.steps.JarResultBuildStep.buildThinJar(JarResultBuildStep.java:601)
        at io.quarkus.deployment.pkg.steps.JarResultBuildStep.buildRunnerJar(JarResultBuildStep.java:224)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:849)
        ... 6 more

ERROR: Build failure: Build failed due to errors
        [error]: Build step io.quarkus.deployment.pkg.steps.JarResultBuildStep#buildRunnerJar threw an exception: java.lang.IllegalStateException: java.nio.file.FileSystemException: /nfs/home/nat/keycloak-24.0.5/lib/quarkus/.nfs804a8212004d86bb000af9da: Device or resource busy
        at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:858)
        at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
        at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
        at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
        at java.base/java.lang.Thread.run(Thread.java:842)
        at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: java.nio.file.FileSystemException: /nfs/home/nat/keycloak-24.0.5/lib/quarkus/.nfs804a8212004d86bb000af9da: Device or resource busy
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:248)
        at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
        at java.base/java.nio.file.Files.delete(Files.java:1152)
        at io.quarkus.bootstrap.util.IoUtils.createOrEmptyDir(IoUtils.java:114)
        at io.quarkus.deployment.pkg.steps.JarResultBuildStep.buildThinJar(JarResultBuildStep.java:601)
        at io.quarkus.deployment.pkg.steps.JarResultBuildStep.buildRunnerJar(JarResultBuildStep.java:224)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:849)
        ... 6 more

How to Reproduce?

This was trivially reproducible for me by downloading https://github.com/keycloak/keycloak/releases/download/24.0.5/keycloak-24.0.5.tar.gz and running ./bin/kc.sh start-dev with the Oracle JDK v17 in my environment. Considering that nobody else has reported this error it seems very specific to my NFS environment, but we don't usually encounter problems like this.

Output of uname -a or ver

Linux mp1505-sge66 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

java version "17.0.10" 2024-01-16 LTS Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240) Java HotSpot(TM) 64-Bit Server VM (build 17.0.10+11-LTS-240, mixed mode, sharing)

Quarkus version or git rev

3.8.4

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions