Description
openedon Sep 16, 2024
Module
Core
Testcontainers version
1.20.1
Using the latest Testcontainers version?
Yes
Host OS
MacOS
Host Arch
Apple M3 Pro
Docker version
Client:
Cloud integration: v1.0.35+desktop.10
Version: 25.0.3
API version: 1.44
Go version: go1.21.6
Git commit: 4debf41
Built: Tue Feb 6 21:13:26 2024
OS/Arch: darwin/arm64
Context: desktop-linux
Server: Docker Desktop 4.27.2 (137060)
Engine:
Version: 25.0.3
API version: 1.44 (minimum version 1.24)
Go version: go1.21.6
Git commit: f417435
Built: Tue Feb 6 21:14:22 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.28
GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
What happened?
Here are my findings.
When using TestContainers in tests, some threads are never stopped which creates Zombie threads.
I'm using the RandomizedTesting framework on my projects and it automatically detects when some threads are still running although everything has been stopped.
The problem is not directly coming from TC but from duct-tape
which has been archived 2 years ago by @rnorth.
duct-tape is a dependency of TC: https://github.com/testcontainers/testcontainers-java/blob/0217e78eb986da4e73402288959d05f34b37546f/core/build.gradle#L77C1-L79C6
api ('org.rnorth.duct-tape:duct-tape:1.0.8') {
exclude(group: 'org.jetbrains', module: 'annotations')
}
The problem in duck tape starts here: https://github.com/rnorth/duct-tape/blob/2a1c5be9f2ef3f16bf036cec8752a170d130b61e/src/main/java/org/rnorth/ducttape/timeouts/Timeouts.java#L15-L25
private static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool(new ThreadFactory() {
final AtomicInteger threadCounter = new AtomicInteger(0);
@Override
public Thread newThread(@NotNull Runnable r) {
Thread thread = new Thread(r, "ducttape-" + threadCounter.getAndIncrement());
thread.setDaemon(true);
return thread;
}
});
As soon as you call one of the methods in the Timeouts
class, there's one thread which is started and never stopped.
We do call Timeouts
in LazyFuture
:
ducttape-1
.
Here is a simple test which reproduces the problem:
@RunWith(RandomizedRunner.class)
@TimeoutSuite(millis = 5 * 60 * 1000)
@ThreadLeakScope(ThreadLeakScope.Scope.SUITE)
@ThreadLeakLingering(linger = 10000) // 5 sec lingering
public class ZombieDucttapeDemoIT {
@Test
public void testZombie() throws Exception {
Timeouts.doWithTimeout(1, TimeUnit.SECONDS, () -> {
System.out.println("Hello world!");
});
}
}
When I stop my tests, I can see this:
Hello world!
sept. 16, 2024 5:05:56 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
WARNING: Will linger awaiting termination of 1 leaked thread(s).
sept. 16, 2024 5:06:06 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
SEVERE: 1 thread leaked from SUITE scope at fr.pilato.test.zombie.minio.ZombieDucttapeDemoIT:
1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
at java.base/jdk.internal.misc.Unsafe.park(Native Method)
at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
sept. 16, 2024 5:06:06 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
INFO: Starting to interrupt leaked threads:
1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
sept. 16, 2024 5:06:08 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
SEVERE: There are still zombie threads that couldn't be terminated:
1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
at java.base/jdk.internal.misc.Unsafe.park(Native Method)
at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at fr.pilato.test.zombie.minio.ZombieDucttapeDemoIT:
1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
at java.base/jdk.internal.misc.Unsafe.park(Native Method)
at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
at __randomizedtesting.SeedInfo.seed([5F166AAD0B3CB2D7]:0)
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
at java.base/jdk.internal.misc.Unsafe.park(Native Method)
at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
at __randomizedtesting.SeedInfo.seed([5F166AAD0B3CB2D7]:0)
Process finished with exit code 255
I suggest to do this:
- Move the source code of ducttape to test containers
- Update the code to provide a way to close the started threads
- Ideally automatically close the Threads when
container.close()
is called.
Relevant log output
No response
Additional Information
The code can be found at https://github.com/dadoonet/demo-ssh-mino/blob/master/src/test/java/fr/pilato/test/zombie/ducctape/ZombieDucttapeDemoIT.java