Show informative timeout messages in k8s scanning #2601

mtcolman · 2022-07-27T09:25:05Z

Description

I'm trying to scan my cluster (trivy k8s --report summary cluster) and so far two attempts have failed (I'm trying increases in the timeout value...)

I can't work out from the "FATAL" message whether it is providing this message because of the timeout, or whether the timeout is being caused by whatever causes the FATAL error.

Could the error output be more informative, i.e. "this scan has failed because it hit the timeout limit before successfully scanning all items" or something like that?

What did you expect to happen?

I expect the cluster resources to be scanned.

What happened instead?

I then receive the following messages:

WARN    Increase --timeout value
FATAL   k8s scan error: scanning misconfigurations error: scan error: image scan failed: failed analysis: failed to call hooks: post handler error: scan config error: context deadline exceeded

Output of run with `-debug`:

$ trivy k8s --debug --report summary cluster
2022-07-27T10:17:17.180+0100    DEBUG   Severities: ["UNKNOWN" "LOW" "MEDIUM" "HIGH" "CRITICAL"]
2022-07-27T10:17:21.356+0100    DEBUG   cache dir:  /home/matt/.cache/trivy
2022-07-27T10:17:21.356+0100    DEBUG   DB update was skipped because the local DB is the latest
2022-07-27T10:17:21.356+0100    DEBUG   DB Schema: 2, UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC, NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC, DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC
91 / 1722 [------>_______________________________________________________________________________________________________________________] 5.28% 0 p/s
2022-07-27T10:22:21.359+0100    WARN    Increase --timeout value
2022-07-27T10:22:21.360+0100    FATAL   k8s scan error:
    github.com/aquasecurity/trivy/pkg/k8s/commands.run
        /home/runner/work/trivy/trivy/pkg/k8s/commands/run.go:72
  - scanning misconfigurations error:
    github.com/aquasecurity/trivy/pkg/k8s/scanner.(*Scanner).Scan
        /home/runner/work/trivy/trivy/pkg/k8s/scanner/scanner.go:72
  - scan error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.(*runner).scanArtifact
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:227
  - image scan failed:
    github.com/aquasecurity/trivy/pkg/commands/artifact.scan
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:531
  - failed analysis:
    github.com/aquasecurity/trivy/pkg/scanner.Scanner.ScanArtifact
        /home/runner/work/trivy/trivy/pkg/scanner/scan.go:127
  - failed to call hooks:
    github.com/aquasecurity/trivy/pkg/fanal/artifact/local.Artifact.Inspect
        /home/runner/work/trivy/trivy/pkg/fanal/artifact/local/fs.go:127
  - post handler error:
    github.com/aquasecurity/trivy/pkg/fanal/handler.Manager.PostHandle
        /home/runner/work/trivy/trivy/pkg/fanal/handler/handler.go:75
  - scan config error:
    github.com/aquasecurity/trivy/pkg/fanal/handler/misconf.misconfPostHandler.Handle
        /home/runner/work/trivy/trivy/pkg/fanal/handler/misconf/misconf.go:244
  - context deadline exceeded

Output of `trivy -v`:

$ trivy -v
Version: 0.30.4
Vulnerability DB:
  Version: 2
  UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC
  NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC
  DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC

Additional details (base image name, container registry info...):

The text was updated successfully, but these errors were encountered:

mtcolman · 2022-07-27T10:55:06Z

I subsequetly ran the scan with a 30m timeout and it completed (in just short of 20mins), here is the debug output:

$ trivy k8s --debug --timeout 30m0s --report summary cluster
2022-07-27T10:27:46.422+0100    DEBUG   Severities: ["UNKNOWN" "LOW" "MEDIUM" "HIGH" "CRITICAL"]
2022-07-27T10:27:50.901+0100    DEBUG   cache dir:  /home/matt/.cache/trivy
2022-07-27T10:27:50.901+0100    DEBUG   DB update was skipped because the local DB is the latest
2022-07-27T10:27:50.901+0100    DEBUG   DB Schema: 2, UpdatedAt: 2022-07-27 06:07:47.221501092 +0000 UTC, NextUpdate: 2022-07-27 12:07:47.221500892 +0000 UTC, DownloadedAt: 2022-07-27 09:00:33.9436114 +0000 UTC
1722 / 1722 [--------------------------------------------------------------------------------------------------------------------------] 100.00% 2 p/s
2022-07-27T10:45:14.153+0100    ERROR   Error during vulnerabilities scan: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:
        * unable to inspect the image (registry.aquasec.com/database:2022.4): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
        * unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
        * containerd socket not found: /run/containerd/containerd.sock
        * GET https://registry.aquasec.com/v2/database/manifests/2022.4: unexpected status code 401 Unauthorized: <html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>openresty/1.19.9.1</center>
</body>
</html>

I'm wondering if the "unable to inspect the image" as it can't find docker or podman should be raised as a separate ticket here? Shouldn't this be a check it does up front and immediately alert me prior to the scan running for 20 mins? (i.e. I now need to make docker/podman available and rerun).

josedonizetti · 2022-08-22T23:07:37Z

@mtcolman It seems to me this issue here can be closed, because you were able to scan the cluster once you set the timeout to 30m, correct? And perhaps the issue of not been able to scan the image registry.aquasec.com/database:2022.4 be raised in a separate issue, because it isn't related?

piotr-janek · 2022-09-02T10:40:15Z

I think this is not issue with that particular image. I am seeing similar problem with the images in AWS ECR.
The problem does not occur when one of the following circumstances are met:

number of images to scan is low because of scanning single namespace
the per namespace scan was run first for some of the namespaces and the cache is available. It is not necessary to run the per namespace scan for all namespaces using private repository.
I run docker login first so that trivy will be able to use this to get images

But when cache is clear, number of images to scan high, then trivy k8s --report summary cluster will have no problem accessing external repository (private images in ECR in my case) for some images and will throw 401 for some random other images. All of the images with 401 Unauthorized error will be scanned correctly by trivy if instead of scanning whole cluster I will scan single namespace.

So @josedonizetti maybe there is some problem in a mechanism used to utilize access to remote, private repositories in the case when the volume of such traffic is high or if the scanning takes more than certain time limit? In my case there are scanning takes between 15 and 35 minutes and the timeout parameter in trivy is set to 60m0s.

I am running my tests on Ubuntu 22.04 on trivy 0.31.2. I had seen the same problem or at least problem with the same effect of incomplete report in the trivy 0.28.1

github-actions · 2023-03-09T00:05:54Z

This issue is stale because it has been labeled with inactivity.

mtcolman added the kind/bug Categorizes issue or PR as related to a bug. label Jul 27, 2022

knqyf263 added the target/kubernetes Issues relating to kubernetes cluster scanning label Jul 27, 2022

knqyf263 assigned josedonizetti Jul 27, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Mar 9, 2023

knqyf263 added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. labels May 14, 2023

knqyf263 changed the title ~~Context deadline exceeded~~ Show informative timeout messages in k8s scanning May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show informative timeout messages in k8s scanning #2601

Show informative timeout messages in k8s scanning #2601

mtcolman commented Jul 27, 2022

mtcolman commented Jul 27, 2022

josedonizetti commented Aug 22, 2022

piotr-janek commented Sep 2, 2022

github-actions bot commented Mar 9, 2023

Show informative timeout messages in k8s scanning #2601

Show informative timeout messages in k8s scanning #2601

Comments

mtcolman commented Jul 27, 2022

Description

What did you expect to happen?

What happened instead?

Output of run with -debug:

Output of trivy -v:

Additional details (base image name, container registry info...):

mtcolman commented Jul 27, 2022

josedonizetti commented Aug 22, 2022

piotr-janek commented Sep 2, 2022

github-actions bot commented Mar 9, 2023

Output of run with `-debug`:

Output of `trivy -v`: