Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko build's performance much slower comparing with DID solution #875

Open
caiwei-ebay opened this issue Nov 22, 2019 · 34 comments
Open
Labels
area/performance issues related to kaniko performance enhancement categorized differs-from-docker kind/friction kind/question Further information is requested priority/p2 High impact feature/bug. Will get a lot of users happy works-with-docker

Comments

@caiwei-ebay
Copy link

caiwei-ebay commented Nov 22, 2019

We have a very simple Dockerfile which inherits a ubuntu jdk 8 image, run a few shell commands and copy a few files. Please note the RUN commands comes at the very first.

Our CI is built on top of Kubernetes, the Jenkins build will be run in a slave pod.
We've enabled DID & Kaniko in separate slave images and trigger the builds with Kaniko and Docker. Here is the performance result of building & pushing images we've observed:

Dockerfile by removing all RUN commands:

  • Kaniko: 67s
  • DID: 58s

Dockerfile having 10 RUN commands:

  • Kaniko: 180s
  • Docker in Docker: 89s

May I know why Kaniko is so much slower than DID solution if there are RUN commands in Dockerfile? Can this part speed up?

We've tried the --cache & the --cache-repo parameters, the performance of Kaniko build did not improve at all. Here is the details:

  • We are using a internal Docker registry based on Quay.io
  • We passed --cache=true only and get the error "NAME_INVALID: Nested repositories are not supported."
  • We passed --cache=true & --cache-repo=ANOTHER_REPO, we saw cache uploaded in the 1st build. We did not modify any code and trigger build again, this time we saw a few cache hits.

However the performance is much worse with cache, taking 254s. I think the cache uploading or downloading is also a time killer.

Please help explain the cache issue and advice how we can further improve the performance for Kaniko build.

The Dockerfile we used likes below:


FROM abc
COPY *.jar /app/app.jar

RUN jar -xvf app.jar &&
rm -rf app.jar &&
mkdir -p /layer_build/lib/snapshots &&
mkdir -p /layer_build/lib/releases &&
mkdir -p /layer_build/app &&
find BOOT-INF/lib -name 'SNAPSHOT' -type f -exec mv {} /layer_build/lib/snapshots ; &&
mv BOOT-INF/lib/* /layer_build/lib/releases &&
rm -rf BOOT-INF/lib &&
mv * /layer_build/app

FROM def
COPY --from=0 layer_build/lib/snapshots/ /app/BOOT-INF/lib/
COPY --from=0 layer_build/lib/releases/ /app/BOOT-INF/lib/
COPY --from=0 layer_build/app/ /app/

WORKDIR /app
CMD ["/bin/bash", "-c", "/app/bin/run.sh"]


@cvgw cvgw added area/performance issues related to kaniko performance enhancement kind/question Further information is requested priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Nov 22, 2019
@mcfedr
Copy link
Contributor

mcfedr commented Nov 25, 2019

I've noticed similar issues - I use GitLab runner on Kubernetes, and in the same way as you described, ran dind and kaniko at the same time, kaniko is much slower. At the moment I've switched to using kaniko on Cloud Build, and there its pretty fast and caches better than docker.

@caiwei-ebay
Copy link
Author

kaniko on Cloud Build

Thanks for the information, I believe you are talking about https://cloud.google.com/blog/products/application-development/build-containers-faster-with-cloud-build-with-kaniko.

Unfortunately we are using an internal docker registry based on quay.io, so it cannot benefit us.
The cache uploading & downloading with quay takes much more time than without using cache as we observed.

@consideRatio
Copy link

consideRatio commented Jan 29, 2020

It seems a lot of time is spent snapshotting the filesystem, which I believe is used to ensure we get an end result with multiple layers.

By using --single-snapshot there will only be a single layer added to the base image, and I assume we won't slow down making intermediary snapshots for intermediary layers.

It can of course be nice to have layers, so improving performance like this is a compromise. I ended up with 15 minutes instead of 25 minutes for one of my builds.

@u2bo
Copy link

u2bo commented Feb 28, 2020

have the same question. in jenkins used dind faster than Kaniko . most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

@klkl0808
Copy link

klkl0808 commented Mar 3, 2020

most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

I have the same question. I tried kaniko build on gitlab and it's also slower than with docker.

@bakayolo
Copy link

bakayolo commented Mar 18, 2020

Same here.
Trying to improve the build of https://beta.kintohub.com/ by transitioning from DiD to Kanico but DiD is faster, even with caching. Seems that most of the time is indeed spent in [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY]

@haampie
Copy link
Contributor

haampie commented Apr 1, 2020

Experiencing the same issue. In fact I don't see any difference in runtimes when using --cache=true... it definitely pulls cached layers, but it does not speed up the builds at all.

@bergkvist
Copy link

bergkvist commented Apr 18, 2020

I'm using kaniko in GitLab CI/CD with runners in a DigitalOcean Kubernetes cluster (3x 2GB 1vCPU).

Benchmark: create-react-app (multi-stage build)

FROM node:12-alpine as build
WORKDIR /home/app/
COPY package.json ./
COPY yarn.lock ./
RUN yarn 
COPY . .
RUN yarn build

FROM nginx:1.13.12-alpine
COPY --from=build /home/app/build /var/www
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Building locally with docker build on my laptop:
~ 2 minutes

Building with kaniko in a GitLab runner:
~ 38 minutes
It spends most of the time (~32 minutes) on the "Taking snapshot of full filesystem..." step.

Same as previous with --single-snapshot:
~ 33 minutes

Using Docker in Docker:
~5 minutes

@swist
Copy link

swist commented Apr 29, 2020

We've been experiencing similar problems with kaniko in case of builds that produce a large number of small files on the filesystem in the intermediate stages. Multi-stage builds also seem to contribute to slow speed

@bsmedberg-xometry
Copy link

I expect the reason for this difference in speed is that "native" docker manages the layered filesystem using overlayfs (overlay2): so taking a snapshot is as simple as telling the FS driver to finish a layer. While kaniko doesn't natively track that on the filesystem, so it has to stop and stat everything in the filesystem in order to take a snapshot.

I'd be interested in whether this is a fundamental limitation of the kaniko design, or whether if you can have a user-mode file system driver or overlayfs running in the docker container running kaniko, you could obtain the matching speeds.

@mayrbenjamin92
Copy link

@bsmedberg-xometry I love your explanation as I fully agree. I have just recently watched a very good talk about the "backend" of the Docker daemon in which a guy responsible for the file-system at Docker explains the differences. Whilst it sounds like possible to actually do what you have suggested, I think that it can't be achieved without changing the source-code of kaniko.

@cmamigonian
Copy link

I understand the filesystem snapshotting issue is driven by not using overlayfs, but what would explain the inordinate time it takes kaniko to push a layer to the cache?

@tjtravelnet
Copy link

We are also having this issue. Switching to Kaniko solved some other DIND issues we were having, but added 12+ minutes to our build times.

  • Gitlab SaaS (13.x)
  • Private integrated (EKS) Kubernetes cluster + runner
  • DID build time avg: ~4m
  • Kaniko build time avg: ~16m

@tejal29
Copy link
Member

tejal29 commented Mar 31, 2021

@tjtravelnet Did you use any of the new use-new-run flag?
you can also use help us with some profiling data to understand where kaniko is spending time https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling

@Kyouuma
Copy link

Kyouuma commented Apr 22, 2021

Build times are insanely long compared to DIND even with caching activated.

Environment:

  • Jenkins
  • Azure Kubernetes Service.
  • Azure Container Registry.

@acherifi
Copy link

Same experience on my side with Kubernetes gitlab runners.

The build is a WAY longer than on my computer and I build on a pentium...
Any improvments ?

@jerry153fish
Copy link

Has the similar issue, end up with add --snapshotMode=redo, turn all the verbose off, and filtering all the unnecessary file in .dockerignore. The result is acceptable now. From 46m to ~ 10m.

@ghost
Copy link

ghost commented Jul 13, 2021

We can observe this behavior, too - but from my point of view it's not a real problem here. Of course it would be nice if the snapshot taking could be tuned, but it will never reach the performance of an overlayfs based snapshot / layer creation.
So for us the best solution which works is to perform all the build work outside kaniko (no multi staged builds), build stuff in an own Gitlab job k8s container and then just copy the assembled application - with only the needed files - to the image that has to be build with Kaniko. Then the performance impacts are no problem regarding the big security benefit we get when we don't rely on DIND (which sould be forbidden in CI/CD in times of supply chain attacks..).

@bhordupur
Copy link

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

@haljac
Copy link

haljac commented Mar 25, 2022

We have builds running in Kaniko that, due to the file system snapshots, are taking unacceptably long. This does not seem to have been remedied by using --use-new-run or --snapshotMode=redo individually, although using them together did substantially improve the build duration (still unacceptably long for this use-case, unfortunately). Just a +1 that this appears to remain an issue.

@pdfrod
Copy link

pdfrod commented May 23, 2022

Same here. I tried used Kaniko in Google Cloud Build to get better caching behavior, but it's so slow that it's not worth it. Using --use-new-run or --snapshotMode=redo does improve things a little, but using Docker is still much faster.

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

@rushilsrivastava
Copy link

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

Curious, are are you using Buildx with Cloud Build?

@pdfrod
Copy link

pdfrod commented Nov 20, 2022

Curious, are are you using Buildx with Cloud Build?

I tried to, but unfortunately my team is using GCP Container Registry and it doesn't seem to support Buildx cache artifacts.

Artifact Registry on the other hand seems to work fine with Buildx, but since it's a lot more expensive that Container Registry, I'm not sure if it's worth it for us.

@aaron-prindle aaron-prindle added priority/p2 High impact feature/bug. Will get a lot of users happy kind/friction and removed priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Jun 12, 2023
@salamer
Copy link

salamer commented Aug 7, 2023

same problem,
any progress? I realize that this question has been open for 4 years, is there any kaniko related benchmark?

@0x217
Copy link

0x217 commented Aug 18, 2023

i have the same problem

@mdagost
Copy link

mdagost commented Sep 11, 2023

Me too.

@KamilKopaczyk
Copy link

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

If you consider using those flags, please check the docs first and proceed with caution, as using those flag may cause errors for you.

At the time of writing:
--use-new-run

[...] This new run mode trades off accuracy/correctness in some cases (potential for missed files in a "snapshot") for improved performance by avoiding the full filesystem snapshots.

--snapshotMode
If it runs in mode other than full, it doesn't compare e.g. file contents

@ole1986
Copy link

ole1986 commented Jan 12, 2024

running a Kaniko pod in a microk8s Kubernetes with setting hostNetwork: true increases the performance significantly.
With that setup I reduced the time of an image creation from ~12 min to ~3 min

So there might be some firewall/network issue when host network is not exposed
Of course, its not a recommended setting. But at lease I know a possible reason

@amine-mokaddem
Copy link

Same thing here

@akimrx
Copy link

akimrx commented Apr 20, 2024

the same.

upd: with such flags, it works on the same level as docker for me. decreased from 45 minutes to 8 minutes for a fairly dense image

  stage: build
  rules:
    - !reference [.master_or_web__rules, rules]
  script:
    - >-
      /kaniko/executor
      --context $CI_PROJECT_DIR/image
      --dockerfile $CI_PROJECT_DIR/image/Dockerfile
      --destination ${CI_DOCKER_IMAGE}:${CI_COMMIT_SHORT_SHA}
      --destination ${CI_DOCKER_IMAGE}:latest
      --cache=false
      --cache-repo=${CI_DOCKER_IMAGE}:latest
      --cache-ttl=1h
      --force
      --cleanup
      --single-snapshot

@gimse
Copy link

gimse commented May 17, 2024

I also have the problem.

@rafalr-ntropy
Copy link

Same here, kaniko builds still take ridiculously long even with "--snapshotMode=redo" and "--use-new-run".

@codethief
Copy link

codethief commented Sep 11, 2024

Same here, kaniko builds still take ridiculously long even with "--snapshotMode=redo" and "--use-new-run".

Yup, as it so happens I ran into this again today, with the same ⬆️ settings: Basically, I tried to build an image very similar to buildpack-deps with debian:bookworm as base, the only difference being that I apt-get install'ed all dependencies in a single Dockerfile. Result: apt-get install takes ~5min, Kaniko snapshot takes > 55min and the Gitlab CI job aborts. It seems the bottleneck is doing many file system changes during the build.

image

@codethief
Copy link

codethief commented Sep 11, 2024

@akimrx

upd: with such flags [… --single-snapshot …], it works on the same level as docker for me. decreased from 45 minutes to 8 minutes for a fairly dense image

Sure, but then you give up on layer caching entirely since you only take a single snapshot at the very end. EDIT: I see you even set --cache=false, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance issues related to kaniko performance enhancement categorized differs-from-docker kind/friction kind/question Further information is requested priority/p2 High impact feature/bug. Will get a lot of users happy works-with-docker
Projects
None yet
Development

No branches or pull requests