Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image build fails if it takes longer than 20 minutes #254

Open
voroninman opened this issue Aug 2, 2021 · 17 comments · May be fixed by #343
Open

Image build fails if it takes longer than 20 minutes #254

voroninman opened this issue Aug 2, 2021 · 17 comments · May be fixed by #343
Labels
bug Something isn't working r/image Relates to the image resource r/registry_image stale

Comments

@voroninman
Copy link

voroninman commented Aug 2, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and docker Provider) Version

Terraform v1.0.3
on darwin_amd64

  • provider registry.terraform.io/kreuzwerker/docker v2.14.0

Affected Resource(s)

  • docker_image
  • docker_registry_image

Expected Behaviour

The image is built by Terraform.

Actual Behaviour

...
module.foo.docker_registry_image.this["pytorch-gpu"]: Still creating... [19m50s elapsed]
╷
│ Error: Error building docker image: context deadline exceeded
│
│   with module.foo.docker_registry_image.this["pytorch-gpu"],
│   on ../main.tf line 153, in resource "docker_registry_image" "this":
│  153: resource "docker_registry_image" "this" {
│
╵

Steps to Reproduce

Try to build a Docker image in Terraform for the following Dockerfile:

FROM busybox:latest
RUN sleep 1201

Important Factoids

The timeout includes the uploading of the image.

Building a Docker image for a data science environment usually means downloading or compiling big software packages (Pytorch+CUDA in my case) that result in 2-6 Gb images. The time of building and uploading such a image sometimes exceeds 20 minutes.

@voroninman
Copy link
Author

The workaround is to build it with docker build, docker tag and docker push and hopefully Docker will reply quick on your next terraform apply. It's not always the case for me but it's likely due to my setup.

@voroninman
Copy link
Author

Also, I haven't found a way to sneak peek into the progress of creating a docker_registry_image so I had put a Unix socket "proxy" up with socat -d -v -d TCP-L:2375,fork UNIX:/var/run/docker.sock and point the Terraform Docker provider to tcp://localhost:2375. Is there a better way?

@suzuki-shunsuke suzuki-shunsuke added r/image Relates to the image resource r/registry_image labels Aug 2, 2021
@suzuki-shunsuke
Copy link
Collaborator

suzuki-shunsuke commented Aug 2, 2021

I could reproduce the similar error with docker_image resource.

$ terraform version
Terraform v1.0.3
on darwin_amd64
+ provider registry.terraform.io/kreuzwerker/docker v2.14.0

docker version

$ docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.4
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:22 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

main.tf

resource "docker_image" "zoo" {
  name = "zoo"
  build {
    path = "."
  }
}

terraform {
  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "2.14.0"
    }
  }
}

provider "docker" {
}

Dockerfile

FROM busybox:latest
RUN sleep 1201
$ TF_LOG=debug terraform apply -auto-approve

docker_image.zoo: Still creating... [19m40s elapsed]
docker_image.zoo: Still creating... [19m50s elapsed]
2021-08-02T20:31:08.545+0900 [INFO]  provider.terraform-provider-docker_v2.14.0: 2021/08/02 20:31:08 [DEBUG] Step 1/2 : FROM busybox:latest
latest: Pulling from library/busybox
b71f96345d44: Pulling fs layer
b71f96345d44: Download complete
b71f96345d44: Pull complete
Digest: sha256:0f354ec1728d9ff32edcd7d1b8bbdfc798277ad36120dc3dc683be44524c8b60
Status: Downloaded newer image for busybox:latest
 ---> 69593048aa3a
Step 2/2 : RUN sleep 1201
 ---> Running in 761974235ec3: timestamp=2021-08-02T20:31:08.545+0900
╷
│ Error: Unable to read Docker image into resource: unable to list Docker images: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.41/images/json": context deadline exceeded
│ 
│   with docker_image.zoo,
│   on main.tf line 1, in resource "docker_image" "zoo":
│    1: resource "docker_image" "zoo" {
│ 
╵
2021-08-02T20:31:08.575+0900 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-08-02T20:31:08.577+0900 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/kreuzwerker/docker/2.14.0/darwin_amd64/terraform-provider-docker_v2.14.0 pid=31909
2021-08-02T20:31:08.577+0900 [DEBUG] provider: plugin exited

Debug log: https://gist.github.com/suzuki-shunsuke/1e56152acad81333cbab0b47bc6fa92b

apiImage, err := findImage(ctx, imageName, client, meta.(*ProviderConfig).AuthConfigs)
if err != nil {
return diag.Errorf("Unable to read Docker image into resource: %s", err)

func fetchLocalImages(ctx context.Context, data *Data, client *client.Client) error {
images, err := client.ImageList(ctx, types.ImageListOptions{All: false})
if err != nil {
return fmt.Errorf("unable to list Docker images: %s", err)

But I can't find the timeout setting.

@suzuki-shunsuke suzuki-shunsuke added the bug Something isn't working label Aug 2, 2021
@voroninman
Copy link
Author

voroninman commented Aug 2, 2021

The timeout itself comes from https://github.com/hashicorp/terraform-plugin-sdk/blob/112e2164c381d80e8ada3170dac9a8a5db01079a/helper/schema/resource_data.go#L409-L415.

@mavogel
Copy link
Contributor

mavogel commented Aug 3, 2021

We might need a separate timeout block: https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts

@voroninman
Copy link
Author

FYI,

The workaround is to build it with docker build, docker tag and docker push and hopefully Docker will reply quick on your next terraform apply. It's not always the case for me but it's likely due to my setup.

I figured out that docker build . does use BuildKit where this provider doesn't so that is probably the reason why they didn't share the build caches.

export DOCKER_BUILDKIT=0 solved it for me.

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity.
Remove stale label or comment or this will be closed in 7 days.
If you don't want this issue to be closed, please set the label pinned.

@github-actions github-actions bot added the stale label Oct 31, 2021
@voroninman
Copy link
Author

This seems like a trivial change. I haven't contributed to this repo but if no one is looking into the issue, I might try.

@github-actions github-actions bot removed the stale label Nov 2, 2021
@mavogel mavogel removed this from the v2.16.0 milestone Nov 30, 2021
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity.
Remove stale label or comment or this will be closed in 7 days.
If you don't want this issue to be closed, please set the label pinned.

@github-actions github-actions bot added the stale label Jan 30, 2022
@voroninman
Copy link
Author

Oh, I forgot about this one. I will have a look the next week.

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue is stale because it has been open 60 days with no activity.
Remove stale label or comment or this will be closed in 7 days.
If you don't want this issue to be closed, please set the label pinned.

@matifali
Copy link

Could this be reopened?

@matifali
Copy link

matifali commented Jul 5, 2023

Any plans to specify a custom timeout larger than 20 minutes?

@sao-coding
Copy link

Is this possible to fix?

@renesas-brandon-hussey
Copy link

I could use a fix as well. I see a PR is waiting

@jademackay
Copy link

I need this too.

@mojidrachirag
Copy link

We need it too. My docker build downloads many pip packages and it takes more than 20 mins and fails with terraform.
This can be fixed by implementing,
https://developer.hashicorp.com/terraform/plugin/sdkv2/resources/retries-and-customizable-timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working r/image Relates to the image resource r/registry_image stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants