Skip to content

docker build freeze at exportLayer phase #696

Open

Description

This is follow-up of debug I did on docker/for-win#3229

It seems that my issue is slightly different from that one actually.

Setup

  • OS: Windows Server 2019(Headless)
  • docker version: 19.03.2 (client and server)

How to reproduce

When building a large docker container:

FROM mcr.microsoft.com/windows/servercore
RUN @powershell -NoProfile -ExecutionPolicy unrestricted -Command "(iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1')))" 
RUN choco install msys2

Actually my setup is a bit more complicated, as I am building behind corporate proxies, and I have to customize both choco and msys2 package to work in my env.
And.. I have no way to work without the proxies.

I have the same issue with other large docker images (dockerizing Matlab is the first usecase I have been experiencing this with, but It is even less reproducible in an open-source collaboration environment)

Symptom

When trying to build this container, the docker build freezes just after finishing the "RUN choco install msys2" command.

After many tries, I had some times where the build actually finished pretty much instantly.

I have been trying to reprod this with a simpler setup (a dockerfile creating thousands of file), but was unable to do so.

So I don't know exactly what triggers what happens to be a race condition

After doing some stack-traces, I observe that the code is stuck in

os.RemoveAll(r.root)

os.RemoveAll(0xc000d1c000, 0x2a, 0x0, 0x0)
        C:/.GOROOT/src/os/path.go:67 +0x3c
github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer.(*legacyLayerReaderWrapper).Close(0xc0000ba980, 0xc0000ba980, 0x2546fe0)
        C:/go/src/github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer/exportlayer.go:74 +0x95
github.com/docker/docker/daemon/graphdriver/windows.(*Driver).exportLayer.func1.1(0x5f8, 0xc00078e000)
        C:/go/src/github.com/docker/docker/daemon/graphdriver/windows/windows.go:672 +0x120

This code is using exportLayer syscall from the winfilter directory to the tmp directory.
Then when the tarfile has been produced, it will remove the tmp directory (like \\\\?\\C:\\ProgramData\\docker\\tmp\\hcs425012433) version of the layer.

Workaround

After a bit of back and forth, I did try docker-ci-zap.exe on the hcs425012433 folder, then the docker build command will instantly unfreeze.

So I hacked a new dockerd-dev.exe using following patch.

diff --git a/internal/wclayer/exportlayer.go b/internal/wclayer/exportlayer.go
index 0425b33..0753ff2 100644
--- a/internal/wclayer/exportlayer.go
+++ b/internal/wclayer/exportlayer.go
@@ -71,6 +71,10 @@ type legacyLayerReaderWrapper struct {

 func (r *legacyLayerReaderWrapper) Close() error {
        err := r.legacyLayerReader.Close()
+       // if the layer is not Destroyed at hcs level before removing
+       // we might enter in a race-condition for large containers
+       // which end-up in a hang of the os.RemoveAll() call
+       DestroyLayer(r.root)
        os.RemoveAll(r.root)
        return err
 }

I have no idea if this is the right solution for this problem or this is rather an issue with the windows kernel.

I can submit that patch as a PR if requested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions