Skip to content

push auth fails when 5 to 10 Minutes after pull auth (with Workload Identity in GCP) #5852

@MichaelKorn

Description

@MichaelKorn

Contributing guidelines and issue reporting guide

Well-formed report checklist

  • I have found a bug that the documentation does not mention anything about my problem
  • I have found a bug that there are no open or closed issues that are related to my problem
  • I have provided version/information about my environment and done my best to provide a reproducer

Description of bug

Bug description

Using Google Artifact Registry and Workload Identity for authentication:
Image pushes fail due to auth fail if the push is exactly 5 Minutes to 10 Minutes after the cache pull. With following Error:

Error: buildx failed with: ERROR: failed to solve: error writing layer blob: failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://europe-west3-docker.pkg.dev/v2/token?scope=repository%3A__our-project__%2F__our-registry__%2Ftest-nginx-image%3Apull%2Cpush&service=europe-west3-docker.pkg.dev: 401 Unauthorized

Error seems from authprovider.go#L140 and the issue could result from authprovider.go#L62.
I tried to change the code to:

// Tokens for Google Artifact Registry via Workload Identity expire after 5 minutes
return time.Since(created) > 5*time.Minute-10*time.Second

But these changes (also tried to change the log) are not reflected after I build the buildkit image and use it in buildx.

Reproduction

  1. Docker Registry in Google Artifact Registry
  2. Run the build via Github Actions
  3. Use Workload Identity Federation for authentication against Google Services from GH workflow run
  4. use buildx
  5. The build needs to have two [auth]
    1. first tiggered due to --cache-from: [auth] .../test-nginx-image:pull token for europe-west3-docker.pkg.dev
    2. second trigered due to --cache-to or --push: [auth] .../test-nginx-image:pull,push token for europe-west3-docker.pkg.dev
  6. The second [auth] needs to be more than 5 Minutes, but less than 10 Minutes after the first [auth].
    1. At the beginning it was a complicate Dockerfile, but a simple build with sleep 270 works, sleep 300 fails and sleep 600 (and much more) works fine again.
  • Using other Authentication mechanism works fine.
  • I tried several sleeps, also before the build, seems really only related to a single docker buildx build call.
  • As workaround we can do a build without push, followed by a build with push. The --cache-from can stay in the second call, as everything is cached there is no [auth] for the remote cache needed (or in the log) during the second run.

Version information

/usr/bin/docker buildx version
  github.com/docker/buildx v0.21.3 7b5fecbd7a62d73843f7a73a6d4ec353c0555ef5
/usr/bin/docker buildx inspect --bootstrap --builder builder-db441f8f-6bde-49ee-b10d-ccac2e79b5c6
  #1 [internal] booting buildkit
  #1 pulling image moby/buildkit:buildx-stable-1
  #1 pulling image moby/buildkit:buildx-stable-1 4.9s done
  #1 creating container buildx_buildkit_builder-db441f8f-6bde-49ee-b10d-ccac2e79b5c60
  #1 creating container buildx_buildkit_builder-db441f8f-6bde-49ee-b10d-ccac2e79b5c60 13.0s done
  #1 DONE 18.0s
  Name:          builder-db441f8f-6bde-49ee-b10d-ccac2e79b5c6
  Driver:        docker-container
  Last Activity: 2025-03-18 18:15:21 +0000 UTC
  
  Nodes:
  Name:                  builder-db441f8f-6bde-49ee-b10d-ccac2e79b5c60
  Endpoint:              unix:///run/docker/docker.sock
  Status:                running
  BuildKit daemon flags: --debug --allow-insecure-entitlement=network.host
  BuildKit version:      v0.20.1
  Platforms:             linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386
  Labels:
   org.mobyproject.buildkit.worker.executor:         oci
   org.mobyproject.buildkit.worker.hostname:         56024517c3e3
   org.mobyproject.buildkit.worker.network:          host
   org.mobyproject.buildkit.worker.oci.process-mode: sandbox
   org.mobyproject.buildkit.worker.selinux.enabled:  false
   org.mobyproject.buildkit.worker.snapshotter:      overlayfs
  GC Policy rule#0:
   All:            false
   Filters:        type==source.local,type==exec.cachemount,type==source.git.checkout
   Keep Duration:  48h0m0s
   Max Used Space: 488.3MiB
  GC Policy rule#1:
   All:            false
   Keep Duration:  [144](..../actions/runs/1644262/job/10093592#step:5:149)0h0m0s
   Reserved Space: 9.313GiB
   Max Used Space: 93.13GiB
   Min Free Space: 36.32GiB
  GC Policy rule#2:
   All:            false
   Reserved Space: 9.313GiB
   Max Used Space: 93.13GiB
   Min Free Space: 36.32GiB
  GC Policy rule#3:
   All:            true
   Reserved Space: 9.313GiB
   Max Used Space: 93.13GiB
   Min Free Space: 36.32GiB
/usr/bin/docker version
  Client:
   Version:           28.0.1
   API version:       1.48
   Go version:        go1.23.6
   Git commit:        068a01e
   Built:             Wed Feb 26 10:40:04 2025
   OS/Arch:           linux/amd64
   Context:           default
  
  Server: Docker Engine - Community
   Engine:
    Version:          28.0.1
    API version:      1.48 (minimum version 1.[24](......./actions/runs/1644262/job/10093592#step:5:25))
    Go version:       go1.23.6
    Git commit:       bbd0a17
    Built:            Wed Feb 26 10:41:19 20[25](......./actions/runs/1644262/job/10093592#step:5:26)
    OS/Arch:          linux/amd64
    Experimental:     false
   containerd:
    Version:          v1.7.25
    GitCommit:        bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
   runc:
    Version:          1.2.5
    GitCommit:        v1.2.5-0-g59923ef
   docker-init:
    Version:          0.19.0
    GitCommit:        de40ad0
  /usr/bin/docker info
  Client:
   Version:    28.0.1
   Context:    default
   Debug Mode: false
   Plugins:
    buildx: Docker Buildx (Docker Inc.)
      Version:  v0.21.3
      Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
    compose: Docker Compose (Docker Inc.)
      Version:  v2.34.0
      Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  
  Server:
   Containers: 0
    Running: 0
    Paused: 0
    Stopped: 0
   Images: 0
   Server Version: 28.0.1
   Storage Driver: overlay2
    Backing Filesystem: extfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: true
   Logging Driver: json-file
   Cgroup Driver: cgroupfs
   Cgroup Version: 2
   Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
   Swarm: inactive
   Runtimes: io.containerd.runc.v2 runc
   Default Runtime: runc
   Init Binary: docker-init
   containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
   runc version: v1.2.5-0-g59923ef
   init version: de40ad0
   Security Options:
    seccomp
     Profile: builtin
    cgroupns
   Kernel Version: 5.15.0-1073-gke
   Operating System: Alpine Linux v3.21
   OSType: linux
   Architecture: x86_64
   CPUs: 16
   Total Memory: 125.8GiB
   Name: gcp-tiny-qfhqz-runner-wdnfq
   ID: efccffe4-c154-4c21-8d4c-1cdb57c2dceb
   Docker Root Dir: /var/lib/docker
   Debug Mode: false
   Experimental: false
   Insecure Registries:
    ::1/128
    1[27](....../actions/runs/1644262/job/10093592#step:5:28).0.0.0/8
   Registry Mirrors:
    https://mirror.gcr.io/
    https://...../
   Live Restore Enabled: false
   Product License: Community Engine

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions