Description
openedon Sep 18, 2019
This is follow-up of debug I did on docker/for-win#3229
It seems that my issue is slightly different from that one actually.
Setup
- OS: Windows Server 2019(Headless)
- docker version: 19.03.2 (client and server)
How to reproduce
When building a large docker container:
FROM mcr.microsoft.com/windows/servercore
RUN @powershell -NoProfile -ExecutionPolicy unrestricted -Command "(iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1')))"
RUN choco install msys2
Actually my setup is a bit more complicated, as I am building behind corporate proxies, and I have to customize both choco and msys2 package to work in my env.
And.. I have no way to work without the proxies.
I have the same issue with other large docker images (dockerizing Matlab is the first usecase I have been experiencing this with, but It is even less reproducible in an open-source collaboration environment)
Symptom
When trying to build this container, the docker build freezes just after finishing the "RUN choco install msys2" command.
After many tries, I had some times where the build actually finished pretty much instantly.
I have been trying to reprod this with a simpler setup (a dockerfile creating thousands of file), but was unable to do so.
So I don't know exactly what triggers what happens to be a race condition
After doing some stack-traces, I observe that the code is stuck in
hcsshim/internal/wclayer/exportlayer.go
Line 74 in bd9b255
os.RemoveAll(0xc000d1c000, 0x2a, 0x0, 0x0)
C:/.GOROOT/src/os/path.go:67 +0x3c
github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer.(*legacyLayerReaderWrapper).Close(0xc0000ba980, 0xc0000ba980, 0x2546fe0)
C:/go/src/github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer/exportlayer.go:74 +0x95
github.com/docker/docker/daemon/graphdriver/windows.(*Driver).exportLayer.func1.1(0x5f8, 0xc00078e000)
C:/go/src/github.com/docker/docker/daemon/graphdriver/windows/windows.go:672 +0x120
This code is using exportLayer syscall from the winfilter directory to the tmp directory.
Then when the tarfile has been produced, it will remove the tmp directory (like \\\\?\\C:\\ProgramData\\docker\\tmp\\hcs425012433
) version of the layer.
Workaround
After a bit of back and forth, I did try docker-ci-zap.exe on the hcs425012433 folder, then the docker build command will instantly unfreeze.
So I hacked a new dockerd-dev.exe using following patch.
diff --git a/internal/wclayer/exportlayer.go b/internal/wclayer/exportlayer.go
index 0425b33..0753ff2 100644
--- a/internal/wclayer/exportlayer.go
+++ b/internal/wclayer/exportlayer.go
@@ -71,6 +71,10 @@ type legacyLayerReaderWrapper struct {
func (r *legacyLayerReaderWrapper) Close() error {
err := r.legacyLayerReader.Close()
+ // if the layer is not Destroyed at hcs level before removing
+ // we might enter in a race-condition for large containers
+ // which end-up in a hang of the os.RemoveAll() call
+ DestroyLayer(r.root)
os.RemoveAll(r.root)
return err
}
I have no idea if this is the right solution for this problem or this is rather an issue with the windows kernel.
I can submit that patch as a PR if requested