Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Kaniko cache doesn't always work as expected #104

Closed
AurelienGasser opened this issue Apr 30, 2020 · 4 comments
Closed

CI: Kaniko cache doesn't always work as expected #104

AurelienGasser opened this issue Apr 30, 2020 · 4 comments

Comments

@AurelienGasser
Copy link
Contributor

AurelienGasser commented Apr 30, 2020

The cache of kaniko (the docker image builder) doesn't always work as expected.

I had an example where we changed the Dockerfile of celeryworker. The build of the new docker image worked fine in gcloud build. However, pulling the image led to a docker error:

failed to register layer: Error processing tar file(exit status 1): failed to mknod("/usr/share/doc/adduser", S_IFCHR, 0): file exists

Full event log for the pod

 Normal   Scheduled         3m34s                  default-scheduler                                             Successfully assigned org-1/backend-org-1-substra-backend-worker-6696b5f5c-x9z94 to gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n
 Normal   BackOff           87s                    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Back-off pulling image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50"
 Warning  Failed            87s                    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Error: ImagePullBackOff
 Normal   Pulling           75s (x2 over 3m24s)    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Pulling image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50"
 Warning  Failed            3s (x2 over 88s)       kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Failed to pull image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): failed to mknod("/usr/share/doc/adduser", S_IFCHR, 0): file exists
 Warning  Failed            3s (x2 over 88s)       kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Error: ErrImagePull

Running run-ci.py with the --no-cache option fixed it, which shows that the issue had to do with reusing cache layers. Note that subsequent runs of run-ci.py without the --no-cache option also succeeded: the cache layers were now sane.

Proposed fix:

  • Fix nightly test retries in ,travis.yaml:
    • Only the first attempt does a full rebuild
    • The 2nd attempts uses the cache
    • No 3rd attempt (Travis max build time is 50 minute : not enough time for 3 attempts)
@Kelvin-M
Copy link
Contributor

Kelvin-M commented May 4, 2020

Seems to be a big issue : GoogleContainerTools/kaniko#1162

@samlesu
Copy link
Contributor

samlesu commented Jun 12, 2020

It seems to be fixed in the last kaniko release. Could you have a look @Kelvin-M or @AurelienGasser ?

@Kelvin-M
Copy link
Contributor

Yes fixed, by using a predefined kaniko image and not latest.
I let @AurelienGasser confirm it :)

@Kelvin-M
Copy link
Contributor

Fixed by latest releases of kaniko

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants