trivy-operator: Pull Images via CRI #101

isugimpy · 2022-06-06T17:41:21Z

Reached out via Twitter and was redirected here.

From my understanding, the operator is pulling container images based on the image path it scrapes from live objects on the cluster. However, it's pulling them directly, bypassing configuration on the host. Personally, I'd like the option to utilize the CRI on the actual host so that we can be sure the image that gets pulled actually matches what's running for real.

To cite a specific example, the clusters I manage are running CRI-O, and utilize the registry mirror functionality to do rewrites of the image pulls live. This allows us to attempt to pull from a cluster-local image cache first, with fallback to our canonical registry if that fails. Additionally, we use the rewrites to ensure that a pull from something like Docker Hub is forcefully redirected to pull through our canonical registry so we can cache and scan images. By having Trivy respect the behavior of the node it runs on, we can better ensure that the images it scans are accurately pulled.

itaysk · 2022-06-13T11:47:13Z

Thanks for opening an issue @isugimpy
Trivy tries to scan images from local image store if available, and if not it will pull them from registry. The interaction with local image store is similar to docker export. Today Trivy can do this local scanning with Docker and (recently) containerd. We looked into interacting with CRI but it seems that CRI doesn't have an export equivalent method [1] [2]. Instead, we could look into supporting CRI-O directly if that makes sense, but then you will have to mount the socket into the Trivy scanner pod for it to work. Another approach as suggested by @erikgb in #118 which is to still pull images, but at least be more accurate by taking the image digest from the running Pod status.

isugimpy · 2022-06-13T12:17:11Z

If it ensures accuracy, having a single pod that mounts in the socket from its host doesn't seem unreasonable at all. Would love to see that. Not being able to ensure that we can pull and scan the images from the same host that they'd be pulled from to be actually run is, in my opinion, a blocker to being able to use the operator in a live environment.

erikgb · 2022-06-13T12:22:49Z

If it ensures accuracy, having a single pod that mounts in the socket from its host doesn't seem unreasonable at all. Would love to see that. Not being able to ensure that we can pull and scan the images from the same host that they'd be pulled from to be actually run is, in my opinion, a blocker to being able to use the operator in a live environment.

I agree, but image name + image digest is the canonical refererence to an image. So if using digest, you don't need to mount the host socket. At least we should have an option to NOT mount anything from the host.

isugimpy · 2022-06-13T12:28:43Z

You're absolutely correct that the combo is canonical, but if there's no other way to resolve that canonical reference due to a rewrite, it'd leave things in undefined behavior territory. Your point makes a ton of sense in the 90+% case. I just worry about it for the smaller percentage of users that take advantage of this feature (containerd also offers the rewrite capability, so this isn't limited specifically to CRI-O).

itaysk · 2022-06-15T11:20:50Z

I just tried to run nginx from ECR just to see how how the Pod Status looks like, and it has the following:

containerID: containerd://366a98117285434b8869ff692a020a136b01b7f6444aaa6e83a158ded431584b
image: public.ecr.aws/nginx/nginx:stable
imageID: public.ecr.aws/nginx/nginx@sha256:76503f7e2c5cf0910640e70264588f264016acdcdb91c99bc7b007efa971c708

Seems like if a rewrite was happening, the image/imageID registry reference would reflect the rewritten one. Is it not the case? (if you could confim @isugimpy)

itaysk · 2022-06-15T11:23:09Z

Also I wanted to point out a couple of additional options to address the original question:

There's an option to specify the mirror as configuration (somehow lost in the docs):

Mirror for the registry , e.g. trivy.registry.mirror.index.docker.io: mirror.io would use mirror.io to get images originated from index.docker.io

There's an experimental (also undocumented) feature to scan workloads by scanning the running container filesystem (vs the container image) using the trivy.command = filesystem configuration. It's currently broken due to FS scanning doesn't work with Trivy version >= 0.23.0 #49. But I mention here to here what people think about this approach.

isugimpy · 2022-06-15T11:58:21Z

Seems like if a rewrite was happening, the image/imageID registry reference would reflect the rewritten one. Is it not the case? (if you could confim @isugimpy)

From one of my clusters:

- containerID: cri-o://defd9a0b99597b814d7f3dc74c125cbdb918d585da43a60cbfbcfafa1658365a
  image: quay.io/superq/chrony-exporter:v0.1.0
  imageID: quay.io/superq/chrony-exporter@sha256:30b6f76b72fe11e769ee3e18d5468d8fbb9a9467640689258d7100eee81db18b

I can confirm with 100% certainty that despite this pointing at quay.io, it was pulled from our internal passthrough cache via mirror rewrites at runtime. CRI-O (and containerd as I recall) doesn't present the result of the rewrite to the Kubelet.

There's an option to specify the mirror as configuration (somehow lost in the docs):

This definitely helps the cause. It's not ideal to have to duplicate the mirror config, but it definitely would work. The one bummer is that it doesn't have fallback to a secondary mirror available. That's hardly a dealbreaker though. If the cluster-local cache is unavailable for some reason, I've got bigger problems than if Trivy is scanning.

There's an experimental (also undocumented) feature to scan workloads by scanning the running container filesystem (vs the container image) using the trivy.command = filesystem configuration. It's currently broken due to FS scanning doesn't work with Trivy version >= 0.23.0 #49. But I mention here to here what people think about this approach.

Filesystem could work, it's an intriguing approach! Wouldn't that require a daemonset though, since not every image is present on every host?

itaysk · 2022-06-18T08:17:09Z

I can confirm with 100% certainty that despite this pointing at quay.io, it was pulled from our internal passthrough cache via mirror rewrites at runtime. CRI-O (and containerd as I recall) doesn't present the result of the rewrite to the Kubelet.

Thanks for checking. For our knowledge, to configure CRI-O for mirroring, are you using this configuration? https://github.com/containers/image/blob/main/registries.conf#L68

itaysk · 2022-06-18T08:35:59Z

Filesystem could work, it's an intriguing approach! Wouldn't that require a daemonset though, since not every image is present on every host?

This feature is meant to be used with vulnerabilityReports.scanJobsInSameNamespace configuration, which makes the scan job run in the namespace of the scanned workload instead of in the trivy-operator dedicated namespace. Running in the target ns is presumed to succeed in starting a new workload (the scan job) with the scanned image (permissions wise).

The feature originally suggested in this issue - scanning the image from local store by mounting the socket into Trivy container - has stricter requirement - of scheduling the scan job on the same node as the scanned workload.

chen-keinan · 2022-06-19T09:52:18Z

@isugimpy related PR #150 that solve the fs scanning issue

chen-keinan · 2022-07-05T15:32:04Z

@isugimpy will trivy-operator with mode:fileSystem, scanning will solve this issue ?

erikgb · 2022-10-09T19:42:12Z

@isugimpy will trivy-operator with mode:fileSystem, scanning will solve this issue ?

No, at least not for us. As Openshift SCC does not allow an unprivileged user to run containers as the root user, ref.:

trivy-operator/deploy/helm/values.yaml

Lines 161 to 162 in 5b9498d

    
           # For filesystem scanning, Trivy needs to run as the root user 
        
           # runAsUser: 0

JAORMX · 2022-10-10T06:22:57Z

@erikgb trivy-operator should anyways not run with the restricted SCC, it should use at least anyuid. If it's running with a low privileged SCC it should be fixed in the OLM packaging.

erikgb · 2022-10-10T06:50:10Z

@erikgb trivy-operator should anyways not run with the restricted SCC, it should use at least anyuid. If it's running with a low privileged SCC it should be fixed in the OLM packaging.

But it is not the operator that is the problem, but the jobs it schedules.

JAORMX · 2022-10-10T06:57:08Z

@erikgb I see... well, adding the needed annotation to schedule them with a higher privilege SCC shouldn't be too problematic. But, either way, permissions to use that SCC would need to be added to the packaging though (it's an RBAC rule).

avishefi · 2023-08-20T13:43:22Z

Related discussion: aquasecurity/trivy#4957

isugimpy added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 6, 2022

erikgb mentioned this issue Jun 9, 2022

Image scanning should be based on actual running container image #118

Closed

chen-keinan added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 13, 2022

itaysk removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 15, 2022

chen-keinan added target/kubernetes Issues relating to kubernetes cluster scanning priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jun 28, 2022

erikgb mentioned this issue Oct 9, 2022

Support for CRI-O aquasecurity/trivy#3004

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trivy-operator: Pull Images via CRI #101

trivy-operator: Pull Images via CRI #101

isugimpy commented Jun 6, 2022

itaysk commented Jun 13, 2022

isugimpy commented Jun 13, 2022

erikgb commented Jun 13, 2022

isugimpy commented Jun 13, 2022

itaysk commented Jun 15, 2022

itaysk commented Jun 15, 2022

isugimpy commented Jun 15, 2022

itaysk commented Jun 18, 2022

itaysk commented Jun 18, 2022 •

edited

Loading

chen-keinan commented Jun 19, 2022 •

edited

Loading

chen-keinan commented Jul 5, 2022

erikgb commented Oct 9, 2022

JAORMX commented Oct 10, 2022

erikgb commented Oct 10, 2022

JAORMX commented Oct 10, 2022

avishefi commented Aug 20, 2023

trivy-operator: Pull Images via CRI #101

trivy-operator: Pull Images via CRI #101

Comments

isugimpy commented Jun 6, 2022

itaysk commented Jun 13, 2022

isugimpy commented Jun 13, 2022

erikgb commented Jun 13, 2022

isugimpy commented Jun 13, 2022

itaysk commented Jun 15, 2022

itaysk commented Jun 15, 2022

isugimpy commented Jun 15, 2022

itaysk commented Jun 18, 2022

itaysk commented Jun 18, 2022 • edited Loading

chen-keinan commented Jun 19, 2022 • edited Loading

chen-keinan commented Jul 5, 2022

erikgb commented Oct 9, 2022

JAORMX commented Oct 10, 2022

erikgb commented Oct 10, 2022

JAORMX commented Oct 10, 2022

avishefi commented Aug 20, 2023

itaysk commented Jun 18, 2022 •

edited

Loading

chen-keinan commented Jun 19, 2022 •

edited

Loading