Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show informative timeout messages in k8s scanning #2601

Open
mtcolman opened this issue Jul 27, 2022 · 4 comments
Open

Show informative timeout messages in k8s scanning #2601

mtcolman opened this issue Jul 27, 2022 · 4 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. target/kubernetes Issues relating to kubernetes cluster scanning

Comments

@mtcolman
Copy link

Description

I'm trying to scan my cluster (trivy k8s --report summary cluster) and so far two attempts have failed (I'm trying increases in the timeout value...)

I can't work out from the "FATAL" message whether it is providing this message because of the timeout, or whether the timeout is being caused by whatever causes the FATAL error.

Could the error output be more informative, i.e. "this scan has failed because it hit the timeout limit before successfully scanning all items" or something like that?

What did you expect to happen?

I expect the cluster resources to be scanned.

What happened instead?

I then receive the following messages:

WARN    Increase --timeout value
FATAL   k8s scan error: scanning misconfigurations error: scan error: image scan failed: failed analysis: failed to call hooks: post handler error: scan config error: context deadline exceeded

Output of run with -debug:

$ trivy k8s --debug --report summary cluster
2022-07-27T10:17:17.180+0100    DEBUG   Severities: ["UNKNOWN" "LOW" "MEDIUM" "HIGH" "CRITICAL"]
2022-07-27T10:17:21.356+0100    DEBUG   cache dir:  /home/matt/.cache/trivy
2022-07-27T10:17:21.356+0100    DEBUG   DB update was skipped because the local DB is the latest
2022-07-27T10:17:21.356+0100    DEBUG   DB Schema: 2, UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC, NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC, DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC
91 / 1722 [------>_______________________________________________________________________________________________________________________] 5.28% 0 p/s
2022-07-27T10:22:21.359+0100    WARN    Increase --timeout value
2022-07-27T10:22:21.360+0100    FATAL   k8s scan error:
    github.com/aquasecurity/trivy/pkg/k8s/commands.run
        /home/runner/work/trivy/trivy/pkg/k8s/commands/run.go:72
  - scanning misconfigurations error:
    github.com/aquasecurity/trivy/pkg/k8s/scanner.(*Scanner).Scan
        /home/runner/work/trivy/trivy/pkg/k8s/scanner/scanner.go:72
  - scan error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.(*runner).scanArtifact
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:227
  - image scan failed:
    github.com/aquasecurity/trivy/pkg/commands/artifact.scan
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:531
  - failed analysis:
    github.com/aquasecurity/trivy/pkg/scanner.Scanner.ScanArtifact
        /home/runner/work/trivy/trivy/pkg/scanner/scan.go:127
  - failed to call hooks:
    github.com/aquasecurity/trivy/pkg/fanal/artifact/local.Artifact.Inspect
        /home/runner/work/trivy/trivy/pkg/fanal/artifact/local/fs.go:127
  - post handler error:
    github.com/aquasecurity/trivy/pkg/fanal/handler.Manager.PostHandle
        /home/runner/work/trivy/trivy/pkg/fanal/handler/handler.go:75
  - scan config error:
    github.com/aquasecurity/trivy/pkg/fanal/handler/misconf.misconfPostHandler.Handle
        /home/runner/work/trivy/trivy/pkg/fanal/handler/misconf/misconf.go:244
  - context deadline exceeded

Output of trivy -v:

$ trivy -v
Version: 0.30.4
Vulnerability DB:
  Version: 2
  UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC
  NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC
  DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC

Additional details (base image name, container registry info...):

@mtcolman mtcolman added the kind/bug Categorizes issue or PR as related to a bug. label Jul 27, 2022
@knqyf263 knqyf263 added the target/kubernetes Issues relating to kubernetes cluster scanning label Jul 27, 2022
@mtcolman
Copy link
Author

I subsequetly ran the scan with a 30m timeout and it completed (in just short of 20mins), here is the debug output:

$ trivy k8s --debug --timeout 30m0s --report summary cluster
2022-07-27T10:27:46.422+0100    DEBUG   Severities: ["UNKNOWN" "LOW" "MEDIUM" "HIGH" "CRITICAL"]
2022-07-27T10:27:50.901+0100    DEBUG   cache dir:  /home/matt/.cache/trivy
2022-07-27T10:27:50.901+0100    DEBUG   DB update was skipped because the local DB is the latest
2022-07-27T10:27:50.901+0100    DEBUG   DB Schema: 2, UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC, NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC, DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC
1722 / 1722 [--------------------------------------------------------------------------------------------------------------------------] 100.00% 2 p/s
2022-07-27T10:45:14.153+0100    ERROR   Error during vulnerabilities scan: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:
        * unable to inspect the image (registry.aquasec.com/database:2022.4): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
        * unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
        * containerd socket not found: /run/containerd/containerd.sock
        * GET https://registry.aquasec.com/v2/database/manifests/2022.4: unexpected status code 401 Unauthorized: <html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>openresty/1.19.9.1</center>
</body>
</html>

I'm wondering if the "unable to inspect the image" as it can't find docker or podman should be raised as a separate ticket here? Shouldn't this be a check it does up front and immediately alert me prior to the scan running for 20 mins? (i.e. I now need to make docker/podman available and rerun).

@josedonizetti
Copy link
Contributor

@mtcolman It seems to me this issue here can be closed, because you were able to scan the cluster once you set the timeout to 30m, correct? And perhaps the issue of not been able to scan the image registry.aquasec.com/database:2022.4 be raised in a separate issue, because it isn't related?

@piotr-janek
Copy link

I think this is not issue with that particular image. I am seeing similar problem with the images in AWS ECR.
The problem does not occur when one of the following circumstances are met:

  • number of images to scan is low because of scanning single namespace
  • the per namespace scan was run first for some of the namespaces and the cache is available. It is not necessary to run the per namespace scan for all namespaces using private repository.
  • I run docker login first so that trivy will be able to use this to get images

But when cache is clear, number of images to scan high, then trivy k8s --report summary cluster will have no problem accessing external repository (private images in ECR in my case) for some images and will throw 401 for some random other images. All of the images with 401 Unauthorized error will be scanned correctly by trivy if instead of scanning whole cluster I will scan single namespace.

So @josedonizetti maybe there is some problem in a mechanism used to utilize access to remote, private repositories in the case when the volume of such traffic is high or if the scanning takes more than certain time limit? In my case there are scanning takes between 15 and 35 minutes and the timeout parameter in trivy is set to 60m0s.

I am running my tests on Ubuntu 22.04 on trivy 0.31.2. I had seen the same problem or at least problem with the same effect of incomplete report in the trivy 0.28.1

@github-actions
Copy link

github-actions bot commented Mar 9, 2023

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Mar 9, 2023
@knqyf263 knqyf263 added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. labels May 14, 2023
@knqyf263 knqyf263 changed the title Context deadline exceeded Show informative timeout messages in k8s scanning May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. target/kubernetes Issues relating to kubernetes cluster scanning
Projects
None yet
Development

No branches or pull requests

4 participants