Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File watching for sync is very slow on Mac OS #8166

Closed
corporealshift opened this issue Nov 28, 2022 · 8 comments · Fixed by #8249 or #9167
Closed

File watching for sync is very slow on Mac OS #8166

corporealshift opened this issue Nov 28, 2022 · 8 comments · Fixed by #8249 or #9167
Assignees
Labels
Milestone

Comments

@corporealshift
Copy link

Expected behavior

Syncing files previously would notice changes in a few seconds (or less).

Actual behavior

Since upgrading to version v2.0.2 it takes 15 seconds or more to notice files to sync. During this time the skaffold process consumes an entire core of CPU power (100% in activity monitor). Once the "Syncing 1 file to..." message gets posted to the console the CPU usage goes down.

Information

  • Skaffold version: v2.0.2
  • Operating system: MacOS Monterey 12.6.1
  • Installed via:Homebrew
  • Contents of skaffold.yaml:
apiVersion: skaffold/v4beta1
kind: Config
build:
  artifacts:
  # Creating a new image requires a new ECR repo by adding a file here and following the README instructions.
  # https://github.com/fulcrumapp/eks-flux2/tree/main/infrastructure/base/fulcrum-ops/ecr
  #
  #
  #   READ BEFORE CHANGING ANYTHING IN ARTIFACTS!!!!
  #
  #
  # Changing anything here needs to consider the other profiles lower in the file that might override
  # or make assumptions on the these lists of artifacts
  - image: snip 
    sync:
      manual:
      - src: app/**
        dest: .
      - src: public/**
        dest: .
      - src: vendor/**
        dest: .
      - src: test/**
        dest: .
      - src: spec/**
        dest: .
      - src: lib/tasks/**
        dest: .
      # this one is less about syncing and more making sure we don't rebuild
      # since we don't use this path in this container
      - src: lib/assets/fulcrum-components/**
        dest: .
      hooks:
        after:
        - container:
            # This should be safe to run on every pod in chaos, but it would be better to only run on sidekiq.
            command: ["sh", "-c", "ps auxww | grep sidekiq | grep app | grep -v grep | awk '{ print $2; }' | xargs -r kill -12"]
            # Currently unsupported by sync hooks, but hopefully soon.
            #containerName: sidekiq
            #podName: rails-fulcrum-sidekiq-all*
        - host:
            command: ["sh", "-c", "echo Restarting sidekiq."]
            os: [darwin, linux]
    docker:
      dockerfile: docker/ruby/Dockerfile
      buildArgs:
        ENV: production
        BASE_VERSION: stable
        ASSETS_VERSION: "dev-{{.USER}}"
  - image: snip 
    docker:
      dockerfile: docker/components/Dockerfile
    sync:
      manual:
      - src: lib/assets/fulcrum-components/src/**
        dest: src
        strip: lib/assets/fulcrum-components/src/
  tagPolicy:
    gitCommit: {}
  platforms: ["linux/amd64"]
  local:
    useBuildkit: true
    concurrency: 2
deploy:
  helm:
    releases:
    - name: rails
      chartPath: charts/fulcrum
      valuesFiles:
      - ./charts/fulcrum/values.yaml
      namespace: '{{.USER}}'
      setValueTemplates:
        # See https://skaffold.dev/docs/environment/templating/ for using IMAGE_TAGN et al
        # skaffold requires the digest to match the image when syncing files
        image.tag: '{{.IMAGE_TAG}}@{{.IMAGE_DIGEST}}'
        rails_tag: '{{.IMAGE_TAG}}@{{.IMAGE_DIGEST}}'
        rails_beta_tag: '{{.IMAGE_TAG3}}@{{.IMAGE_DIGEST3}}'
        components_tag: '{{.IMAGE_TAG2}}@{{.IMAGE_DIGEST2}}'
        fulcrum.stripe.use_stripe_for_trials: false
        fulcrum.stripe.webhook_signing_secret: '{{.STRIPE_WEBHOOK_SIGNING_SECRET}}'
        # To work on expressions, uncomment this, and deploy to your env in the fulcrum-expressions repo
        # fulcrum.rails.config.expression_sandbox_url: "https://fulcrumapp-world-{{.USER}}.s3.amazonaws.com/expv1/expressions.html"
        previewPrefix: '{{.USER}}'
        setup.database.always_drop_and_recreate: "false"
        autoscaling.enabled: false
        servicemonitoring.enabled: false
        fulcrum.rails.subsets.stable.enabled: true
        fulcrum.rails.subsets.beta.enabled: false
        fulcrum.rails.stdin: true
        fulcrum.rails.tty: true
        # uncomment to test subsets
        fulcrum.rails.deploy_subsets_without_traffic: false
      skipBuildDependencies: true
      useHelmSecrets: true
      packaged:
        appVersion: dev-{{.USER}}
    flags:
      install:
      - --timeout
      - 600s
      upgrade:
      - --timeout
      - 600s
  statusCheckDeadlineSeconds: 600
profiles:
- name: arm
  activation:
  - env: FULCRUM_ARCH=arm64
  build:
    platforms: ["linux/arm64"]
  patches:
  - op: add
    path: /deploy/helm/releases/0/setValueTemplates/architectures[0]
    value: arm64
- name: beta
  activation:
  - env: FULCRUM_BETA=true
  patches:
  - op: add
    path: /build/artifacts/-
    value:
      image: snip
      sync:
        manual:
        - src: app/**
          dest: .
        - src: public/**
          dest: .
        - src: vendor/**
          dest: .
        - src: test/**
          dest: .
        - src: spec/**
          dest: .
        # this one is less about syncing and more making sure we don't rebuild
        # since we don't use this path in this container
        - src: lib/assets/fulcrum-components/**
          dest: .
        hooks:
          after:
          - container:
              # This should be safe to run on every pod in chaos, but it would be better to only run on sidekiq.
              command: ["sh", "-c", "ps auxww | grep sidekiq | grep app | grep -v grep | awk '{ print $2; }' | xargs -r kill -12"]
              # Currently unsupported by sync hooks, but hopefully soon.
              #containerName: sidekiq
              #podName: rails-fulcrum-sidekiq-all*
          - host:
              command: ["sh", "-c", "echo Restarting sidekiq."]
              os: [darwin, linux]
      docker:
        dockerfile: docker/ruby/Dockerfile
        buildArgs:
          ENV: production
          GEMFILE: Gemfile.beta
          BASE_VERSION: beta
          ASSETS_VERSION: "dev-{{.USER}}"
  - op: add
    path: /deploy/helm/releases/0/setValueTemplates/fulcrum.rails.subsets.beta.enabled
    value: true
  - op: add
    path: /deploy/helm/releases/0/setValueTemplates/fulcrum.rails.deploy_subsets_without_traffic
    value: true
- name: ci
  build:
    platforms: ["linux/amd64"]
    tagPolicy:
      gitCommit:
        ignoreChanges: true
        variant: AbbrevCommitSha
  patches:
    - op: replace # stable build
      path: /build/artifacts/0/docker/buildArgs/ASSETS_VERSION
      value: "{{.IMAGE_TAG}}"
    - op: replace # beta build
      path: /build/artifacts/2/docker/buildArgs/ASSETS_VERSION
      value: "{{.IMAGE_TAG}}"
- name: kaniko
  build:
    artifacts:
    - image: snip 
      sync:
        manual:
        - src: app/**
          dest: .
        - src: public/**
          dest: .
        - src: vendor/**
          dest: .
        - src: test/**
          dest: .
        - src: spec/**
          dest: .
        - src: lib/tasks/**
          dest: .
        # this one is less about syncing and more making sure we don't rebuild
        # since we don't use this path in this container
        - src: lib/assets/fulcrum-components/**
          dest: .
        hooks:
          after:
          - container:
              # This should be safe to run on every pod in chaos, but it would be better to only run on sidekiq.
              command: ["sh", "-c", "ps auxww | grep sidekiq | grep app | grep -v grep | awk '{ print $2; }' | xargs -r kill -12"]
              # Currently unsupported by sync hooks, but hopefully soon.
              #containerName: sidekiq
              #podName: rails-fulcrum-sidekiq-all*
          - host:
              command: ["sh", "-c", "echo Restarting sidekiq."]
              os: [darwin, linux]
      kaniko:
        cache: {}
        dockerfile: docker/ruby/Dockerfile
        buildArgs:
          ENV: production
          BASE_VERSION: stable
          ASSETS_VERSION: "dev-{{.USER}}"
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip 
      sync:
        manual:
        - src: app/**
          dest: .
        - src: public/**
          dest: .
        - src: vendor/**
          dest: .
        - src: test/**
          dest: .
        - src: spec/**
          dest: .
        # this one is less about syncing and more making sure we don't rebuild
        # since we don't use this path in this container
        - src: lib/assets/fulcrum-components/**
          dest: .
        hooks:
          after:
          - container:
              # This should be safe to run on every pod in chaos, but it would be better to only run on sidekiq.
              command: ["sh", "-c", "ps auxww | grep sidekiq | grep app | grep -v grep | awk '{ print $2; }' | xargs -r kill -12"]
              # Currently unsupported by sync hooks, but hopefully soon.
              #containerName: sidekiq
              #podName: rails-fulcrum-sidekiq-all*
          - host:
              command: ["sh", "-c", "echo Restarting sidekiq."]
              os: [darwin, linux]
      kaniko:
        cache: {}
        dockerfile: docker/ruby/Dockerfile
        buildArgs:
          ENV: production
          GEMFILE: Gemfile.beta
          BASE_VERSION: beta
          ASSETS_VERSION: "dev-{{.USER}}"
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip 
      kaniko:
        dockerfile: docker/components/Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
      sync:
        manual:
        - src: lib/assets/fulcrum-components/src/**
          dest: src
          strip: lib/assets/fulcrum-components/src/
    tagPolicy:
      gitCommit: {}
    platforms: ["linux/amd64"]
    cluster:
      serviceAccount: ci-pipeline
      namespace: ci
      resources:
        requests:
          cpu: 1000m
          memory: 6Gi
      tolerations:
      - key: "arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoExecute"
- name: base
  build:
    artifacts:
    - image: snip 
      kaniko:
        dockerfile: docker/base_images/nginx.Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip
      kaniko:
        dockerfile: docker/base_images/tmpwatch.Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip 
      kaniko:
        dockerfile: docker/base_images/ruby.Dockerfile
        buildArgs:
          BASE: ruby:3.1.2-slim-bullseye
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip 
      kaniko:
        dockerfile: docker/base_images/ruby.Dockerfile
        buildArgs:
          BASE: ruby:3.1.2-slim-bullseye
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip 
      kaniko:
        dockerfile: docker/base_images/rails.Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
      requires:
      - image: snip 
        alias: RUBYBASE
    - image: snip
      kaniko:
        dockerfile: docker/base_images/rails.Dockerfile
        buildArgs:
          GEMFILE: Gemfile.beta
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
      requires:
      - image: snip
        alias: RUBYBASE
    - image: snip 
      kaniko:
        dockerfile: docker/base_images/components.Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    - image: snip
      kaniko:
        dockerfile: docker/base_images/expressions.Dockerfile
        cache: {}
        useNewRun: true
        logTimestamps: true
        reproducible: true
    platforms: ["linux/amd64", "linux/arm64"]
    tagPolicy:
      dateTime:
        format: "2006-01-02_15-04-05"
        timezone: "UTC"
    cluster:
      serviceAccount: ci-pipeline
      namespace: ci
      resources:
        requests:
          cpu: 1000m
          memory: 6Gi
      tolerations:
      - key: "arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoExecute"
    # local:
    #   useBuildkit: true
    #   concurrency: 2

Steps to reproduce the behavior

  1. skaffold dev -vdebug
  2. Change any file
  3. See the watcher file in the debug logs
  4. nothing happens for some time
  5. Finally see the message about syncing 1 file
INFO[0907] files modified: [app/controllers/api/private/plans_controller.rb]  subtask=-1 task=DevLoop
DEBU[0907] Found dependencies for dockerfile: [{jenkins-js.sh /app true 12 12} {jenkins-lint.sh /app true 12 12} {jenkins-rubocop.sh /app true 12 12} {jenkins-test.sh /app true 12 12} {lib/assets/fulcrum-components /app true 13 13}]  subtask=-1 task=DevLoop

[ 10+ seconds ....]

DEBU[0919]  devloop: build false, sync true, deploy false  subtask=-1 task=DevLoop
Syncing 1 files for [280296955917.dkr.ecr.us-east-2.amazonaws.com/rails-stable:v3.9.4-696-g9d4bf6bafe-dirty@sha256:56e412d5b1a91b84299ecf5a0e39dd22d200d30850ddb982151eb383a36aa950](http://280296955917.dkr.ecr.us-east-2.amazonaws.com/rails-stable:v3.9.4-696-g9d4bf6bafe-dirty@sha256:56e412d5b1a91b84299ecf5a0e39dd22d200d30850ddb982151eb383a36aa950)
111
@ericzzzzzzz
Copy link
Contributor

Hi @corporealshift , thank you for reporting this. I cannot run this config.. could you please provide a minimal reproducible project?

@foresterLV
Copy link

We have similar experience on Windows, it now takes skaffold 2.0.3 literally minutes to detect changes even on initial "skaffold debug" command run if previously it was almost instant.

I have launched resource monitor and noticed that skaffold now scans binaries listed in ".dockerignore" file, we have gigabytes of binaries there which can explain the slowdown, plus anti-virus kicks in and makes it even slower.

Seems to be some kind of regression when using docker to build images and ".dockerignore" is now not processed properly?

@rstoermer
Copy link

Same issue here with v2.0.3 in multiple services using Skaffold

@chammond-tz
Copy link

@ericzzzzzzz This issue is still occurring in Skaffold v2.7.1 on Windows. My setup has Skaffold watching 15 Dockerfiles, and when the Dockerfiles use COPY . . in conjunction with a .dockerignore file I see the same behavior described above (i.e. one CPU core gets maxed out for ~12 seconds, then the file finally gets synced). I also see debug logs from Skaffold that it is detecting changes in the .git directory which is definitely listed in the .dockerignore file.

I've confirmed foresterLV's findings with an ugly rewrite of my Dockerfiles to explicitly only copy the files I want (i.e. not using a .dockerignore file), and Skaffold syncs files in less than a second.

For some reason, I still see logs about changes in the .git directory, for some reason, but at least file sync is usable again! Since the .git directory is not referenced, even indirectly, in the Dockerfile, perhaps there's more than one bug at play here?

Can we reopen this issue, or do you want a new one to be created? (I started to do so, but found I was just copy-pasting most of the contents of this one, which wasn't helpful!)

@ericzzzzzzz
Copy link
Contributor

Hi, @chammond-tz , thank you for providing the additional information, could you please check if this also happens on other OS? If the issue is window specific, I'd like to open a new issue.

@chammond-tz
Copy link

I don't have easy access to a Linux or MacOS machine, but I have created a git repo that can reproduce the problem: https://github.com/chammond-tz/skaffold-sync-issue

Can you use this to verify that all platforms are affected (as I believe they are)?

To use:

  1. Clone repo
  2. Run skaffold dev -vdebug
  3. Wait for builds to finish, etc.
  4. Edit the file service-1/src/App.js to trigger a file sync (I like to add a console.log, but whitespace changes are fine too!)
  5. You will se a log like this immediately after the change: time="2023-11-01T11:04:21-07:00" level=debug msg="Change detectednotify.Write: \"C:\\workspace\\open-source-projects\\skaffold-sync-issue\\service-1\\src\\App.js\"" subtask=-1 task=DevLoop
  6. Then, after about 12 seconds and some other logs, you will finally see a log like this: time="2023-11-01T11:04:33-07:00" level=info msg="Copying files:map[service-1\\src\\App.js:[/var/www/src/App.js]]toskaffold-sync-service-1:c5b248eb4230c6d783605ba30c8d9679469ebb5ad83fef2ffc1381a14f9f0cd4" subtask=-1 task=DevLoop and the file will get synced

@longtengz
Copy link

longtengz commented Nov 6, 2023

@ericzzzzzzz This issue is still occurring in Skaffold v2.7.1 on Windows. My setup has Skaffold watching 15 Dockerfiles, and when the Dockerfiles use COPY . . in conjunction with a .dockerignore file I see the same behavior described above (i.e. one CPU core gets maxed out for ~12 seconds, then the file finally gets synced). I also see debug logs from Skaffold that it is detecting changes in the .git directory which is definitely listed in the .dockerignore file.

I'm seeing similar behavior from skaffold v.2.8.0 on MacOS, namely

  • skaffold detected changes in .git, and it keeps detecting only .git changes even after the first sync completes.
    • Change detectednotify.Create: "xxx/.git/index.lock
    • Change detectednotify.Remove: "xxx/.git/index.lock
  • change one source file, and skaffold takes about a minute to sync that to 4 pods. Only difference with https://github.com/chammond-tz/skaffold-sync-issue is that I'm using manual sync, for some reason infer sync in my setup will trigger rebuild.
sync:
  manual:
     - src: '**/*'
       dest: .

@ericzzzzzzz ericzzzzzzz reopened this Nov 6, 2023
@ericzzzzzzz
Copy link
Contributor

ericzzzzzzz commented Nov 8, 2023

I have a feeling that this might be caused by my dependency upgrade for 2.7.1, the issue seems to exists on 2.7.0 as well. I'll work on a fix this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment