Skip to content

Fix possible race condition around file locks #5152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alexarchambault
Copy link
Collaborator

@alexarchambault alexarchambault commented May 19, 2025

FileLock#lock doesn't always block until it can acquire a lock. If the lock is held by the same JVM, it just throws an OverlappingFileLockException. To work around that, this replaces a call to FileLock#lock by a while loop that calls FileLock#tryLock, that stops when the lock is actually acquired.

`FileLock#lock` doesn't always block until it can acquire a lock. If the lock
is held by the same JVM, it just throws an `OverlappingFileLockException`.
To work around that, this replaces a call to `FileLock#lock` by a while loop
that calls `FileLock#tryLock`, that stops when the lock is actually acquired.
@alexarchambault
Copy link
Collaborator Author

alexarchambault commented May 19, 2025

Before this PR, ./mill 'integration.feature[output-directory].local.daemon.testForked' mill.integration.OutputDirectoryLockTests.basic consistently fails on my machine. This PR fixes that.

@alexarchambault
Copy link
Collaborator Author

alexarchambault commented May 19, 2025

Maybe we could just make FileLock#lock handle this, instead of managing it outside of FileLock

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

If the problem is locks within the same JVM, this use case is meant to be handled by DoubleLock that combines an in-memory lock and file lock together, where we take the memory lock before taking the file lock, and releasing them in opposite order. I thought we would already use that in this code path?

@alexarchambault
Copy link
Collaborator Author

When making this test more verbose, I'm getting this stack trace at some point

[4542] An unexpected error occurred java.nio.channels.OverlappingFileLockException
[4542] java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
[4542] java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
[4542] java.base/sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1276)
[4542] java.base/java.nio.channels.FileChannel.lock(FileChannel.java:1089)
[4542] mill.client.lock.FileLock.lock(FileLock.java:21)
[4542] mill.client.lock.DoubleLock.lock(DoubleLock.java:15)
[4542] mill.server.Server$.withOutLock(Server.scala:397)
[4542] mill.daemon.MillMain$.runMillBootstrap$2(MillMain.scala:311)

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

Seems like we only rolled out DoubleLock to the MillDaemonMain entrypoint, should be easy to use it in the MillMain entrypoint as well

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

One question I'd be curious though, is why would MillMain have multiple file locks being taken at a time? MillMain is only used by --no-daemon mode, and it should be only handling a single command evaluation at a time.

@alexarchambault
Copy link
Collaborator Author

Doing this, but that doesn't fix the issue

diff --git a/runner/daemon/src/mill/daemon/MillMain.scala b/runner/daemon/src/mill/daemon/MillMain.scala
index 25887c2ca6d..f795412fae3 100644
--- a/runner/daemon/src/mill/daemon/MillMain.scala
+++ b/runner/daemon/src/mill/daemon/MillMain.scala
@@ -3,7 +3,7 @@ package mill.daemon
 import mill.api.internal.{BspServerResult, internal}
 import mill.api.{Logger, MillException, Result, SystemStreams}
 import mill.bsp.BSP
-import mill.client.lock.Lock
+import mill.client.lock.{DoubleLock, Lock}
 import mill.constants.{DaemonFiles, OutFiles, Util}
 import mill.define.BuildCtx
 import mill.internal.{Colors, MultiStream, PromptLogger}
@@ -58,6 +58,10 @@ object MillMain {
       }
     )

+    val outLock = new DoubleLock(
+      Lock.memory(),
+      Lock.file((out / OutFiles.millOutLock).toString)
+    )
     val daemonDir = os.Path(args.head)
     val (result, _) =
       try main0(
@@ -71,7 +75,7 @@ object MillMain {
           initialSystemProperties = sys.props.toMap,
           systemExit = i => sys.exit(i),
           daemonDir = daemonDir,
-          outLock = Lock.file((out / OutFiles.millOutLock).toString)
+          outLock = outLock
         )
       catch handleMillException(initialSystemStreams.err, ())

@alexarchambault
Copy link
Collaborator Author

The full stack trace I get when making the test more verbose is

[4542] An unexpected error occurred java.nio.channels.OverlappingFileLockException
[4542] java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
[4542] java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
[4542] java.base/sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1276)
[4542] java.base/java.nio.channels.FileChannel.lock(FileChannel.java:1089)
[4542] mill.client.lock.FileLock.lock(FileLock.java:21)
[4542] mill.client.lock.DoubleLock.lock(DoubleLock.java:15)
[4542] mill.server.Server$.withOutLock(Server.scala:397)
[4542] mill.daemon.MillMain$.runMillBootstrap$2(MillMain.scala:315)
[4542] mill.daemon.MillMain$.$anonfun$9$$anonfun$4(MillMain.scala:365)
[4542] mill.daemon.Watching$.watchLoop(Watching.scala:66)
[4542] mill.daemon.MillMain$.$anonfun$9(MillMain.scala:358)
[4542] scala.util.Using$.resource(Using.scala:296)
[4542] mill.daemon.MillMain$.main0$$anonfun$1$$anonfun$1$$anonfun$1(MillMain.scala:370)
[4542] mill.daemon.MillMain$.withStreams$$anonfun$2(MillMain.scala:116)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1(SystemStreams.scala:86)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1(SystemStreams.scala:87)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1(SystemStreams.scala:88)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1$$anonfun$1$$anonfun$1(SystemStreams.scala:89)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] scala.Console$.withErr(Console.scala:193)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1$$anonfun$1(SystemStreams.scala:90)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] scala.Console$.withOut(Console.scala:164)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1$$anonfun$1(SystemStreams.scala:91)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] scala.Console$.withIn(Console.scala:227)
[4542] mill.define.SystemStreams$.withStreams$$anonfun$1(SystemStreams.scala:92)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.define.SystemStreams$.withStreams(SystemStreams.scala:93)
[4542] mill.daemon.MillMain$.withStreams(MillMain.scala:117)
[4542] mill.daemon.MillMain$.main0$$anonfun$1$$anonfun$1(MillMain.scala:142)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.daemon.MillMain$.main0$$anonfun$1(MillMain.scala:385)
[4542] scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
[4542] mill.daemon.MillMain$.main0(MillMain.scala:386)
[4542] mill.daemon.MillDaemonMain.main0(MillDaemonMain.scala:81)
[4542] mill.daemon.MillDaemonMain.main0(MillDaemonMain.scala:59)
[4542] mill.server.Server.$anonfun$7(Server.scala:255)
[4542] java.base/java.lang.Thread.run(Thread.java:840)

@alexarchambault
Copy link
Collaborator Author

But I should use the same memory lock instance, maybe

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

Perhaps the problem is that this lock should also be a double lock?

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

is the problem reproducible? It would be worth adding some logging and seeing exactly what scenario is happening that is causing the lock to be taken twice

@alexarchambault
Copy link
Collaborator Author

alexarchambault commented May 19, 2025

It's reproducible locally for sure, I seem to always run into it. Don't know yet about the CI, it happened once here.

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

It seems the problem is that DoubleLock.java is not using the first memory lock to provide proper mutex. Could it be there is some ordering of method calls on DoubleLock that causes this to fail? Perhaps some combination of concurrent locks/unlocks?

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2025

tryLock seems suspicious

  @Override
  public TryLocked tryLock() throws Exception {
    TryLocked l1 = lock1.tryLock();
    TryLocked l2 = lock2.tryLock();
    if (l1.isLocked() && l2.isLocked()) {
      return new DoubleTryLocked(l1, l2);
    } else {
      // Unlock the locks in the opposite order in which we originally took them
      l2.release();
      l1.release();

      return new DoubleTryLocked(null, null);
    }
  }

Why do we call l2.release? Presumably if we hit that else block, l2 got locked by someone else, and releasing it ourselves would be the wrong thing to do?

@alexarchambault
Copy link
Collaborator Author

Seems permuting these two lines fixes it

@alexarchambault
Copy link
Collaborator Author

Closing in favor of #5153

alexarchambault added a commit that referenced this pull request May 19, 2025
This PR makes some uses of `DoubleLock` safer, and fixes a race
condition with it.

`DoubleLock` is basically used as a wrapper around a file lock (second
lock), using it alongside a memory lock (first lock), that acts as a
mutex for it. That way, the same JVM process doesn't try to acquire the
file lock twice at the same time in a given JVM, which
`java.nio.channels.FileChannel` doesn't allow (it throws an
`OverlappingFileLockException` in that case).

This PR makes sure `MillMain` and `MillDaemonMain` use the same
`Lock.memory()` instance as a guard when trying to acquire a lock for
the output directory. This doesn't solve the issue I was running into
(see below), but seems to be safer nonetheless.

It also tries to dispose of locks in `DoubleLock` in case calls to
`Lock#{lock,tryLock}` throw.

Lastly, it releases the two locks in `DoubleLocked` in the reverse order
that they were acquired. This fixes the test case below. Doing so fixes
a race condition, where
- a first thread acquires the lock
- a second one tries to acquire it, and waits for it
- the first thread releases the lock

During the third step, releasing the memory lock before the file lock
makes the second thread acquire the memory lock, and then possibly try
to acquire the file lock, before the first thread is done releasing the
file lock. The JVM doesn't allow this, and the second thread gets an
`OverlappingFileLockException`.

See the discussion in #5152 too

This fixes
```text
$ ./mill 'integration.feature[output-directory].packaged.daemon.testForked' "mill.integration.OutputDirectoryLockTests.basic"
```
for me, locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants