-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(design): Scan Container Images with Trivy Filesystem Scanne #830
Merged
danielpacak
merged 2 commits into
aquasecurity:main
from
deven0t:deven-trivy-fs-scan-design
Jan 11, 2022
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
# Scan Container Images with Trivy Filesystem Scanner | ||
|
||
Authors: [Devendra Turkar], [Daniel Pacak] | ||
|
||
## Overview | ||
|
||
Starboard currently uses Trivy in [Standalone] or [ClientServer] mode to scan and generate VulnerabilityReports for | ||
container images by pulling the images from remote registries. Starboard scans a specified K8s workload by running the | ||
Trivy executable as a K8s Job. This approach implies that Trivy does not have access to images cached by the container | ||
runtime on cluster nodes. Therefore, to scan images from private registries Starboard reads ImagePullSecrets specified | ||
on workloads or on service accounts used by the workloads, and passes them down to Trivy executable as `TRIVY_USERNAME` | ||
and `TRIVY_PASSWORD` environment variables. | ||
|
||
Since ImagePullSecrets are not the only way to provide registry credential, the following alternatives are not | ||
currently supported by Starboard: | ||
1. Pre-pulled images | ||
2. [Configuring nodes to authenticate to a private registry] | ||
3. Vendor-specific or local extension. For example, methods described on [AWS ECR Private registry authentication]. | ||
|
||
Even though we could resolve some of above-mentioned limitations with hostPath volume mounts to the container runtime | ||
socket, it would have its own disadvantages that we are trying to avoid. For example, more permissions to schedule scan | ||
Jobs and additional information about cluster's infrastructure such as location of the container runtime socket. | ||
|
||
## Solution | ||
|
||
### TL;DR; | ||
|
||
Use Trivy filesystem scanning to scan container images. The main idea, which is discussed in this proposal, is to | ||
schedule a scan Job on the same cluster node where the scanned workload. This allows Trivy to scan a filesystem of | ||
the container image which is already cached on that node without pulling the image from a remote registry. What's more, | ||
Trivy will scan container images from private registries without providing registry credentials (as ImagePullSecret or | ||
in any other proprietary way). | ||
|
||
### Deep Dive | ||
|
||
To scan a container image of a given K8s workload Starboard will create a corresponding container of a scan Job and | ||
override its entrypoint to invoke Trivy filesystem scanner. | ||
|
||
This approach requires Trivy executable to be downloaded and made available to the entrypoint. We'll do that by adding | ||
the init container to the scan Job. Such init container will use the Trivy container image to copy Trivy executable out | ||
to the emptyDir volume, which will be shared with the other containers. | ||
|
||
Another init container is required to download Trivy vulnerability database and save it to the mounted shared volume. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same. |
||
|
||
Finally, the scan container will use shared volume with the Trivy executable and Trivy database to perform the actual | ||
filesystem scan. (See the provided [Example](#example) to have a better idea of all containers defined by a scan Job and | ||
how they share data via the emptyDir volume.) | ||
|
||
> Note that the second init container is required in [Standalone] mode, which is the only mode supported by Trivy | ||
> filesystem scanner at the time of writing this proposal. | ||
|
||
We further restrict scan Jobs to run on the same node where scanned Pod is running and never pull images from remote | ||
registries by setting the `ImagePullPolicy` to `Never`. To determine the node for a scan Job Starboard will list active | ||
Pods controlled by the scanned workload. If the list is not empty it will take the node name from the first Pod, | ||
otherwise it will ignore the workload. | ||
danielpacak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Example | ||
|
||
Let's assume that there's the `nginx` Deployment in the `poc-ns` namespace. It runs the `example.registry.com/nginx:1.16` | ||
container image from a private registry `example.registry.com`. Registry credentials are stored in the `private-registry` | ||
ImagePullSecret. (Alternatively, ImagePullSecret can be attached to a service account referred to by the Deployment.) | ||
|
||
```yaml | ||
--- | ||
apiVersion: v1 | ||
kind: Namespace | ||
metadata: | ||
name: poc-ns | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
labels: | ||
app: nginx | ||
name: nginx | ||
namespace: poc-ns | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: nginx | ||
template: | ||
metadata: | ||
labels: | ||
app: nginx | ||
spec: | ||
imagePullSecrets: | ||
- name: private-registry | ||
containers: | ||
- name: nginx | ||
image: example.registry.com/nginx:1.16 | ||
``` | ||
|
||
To scan the `nginx` container of the `nginx` Deployment, Starboard will create the following scan Job in the | ||
`starboard-system` namespace and observe it until it's Completed or Failed. | ||
|
||
```yaml | ||
--- | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: scan-vulnerabilityreport-ab3134 | ||
namespace: starboard-system | ||
spec: | ||
backoffLimit: 0 | ||
template: | ||
spec: | ||
restartPolicy: Never | ||
# Explicit nodeName indicates our intention to schedule a scan pod | ||
# on the same cluster node where the nginx workload is running. | ||
# This could also imply considering taints and tolerations and other | ||
# properties respected by K8s scheduler. | ||
nodeName: kind-control-plane | ||
volumes: | ||
- name: scan-volume | ||
emptyDir: { } | ||
initContainers: | ||
# The trivy-get-binary init container is used to copy out the trivy executable | ||
# binary from the upstream Trivy container image, i.e. aquasec/trivy:0.19.2, | ||
# to a shared emptyDir volume. | ||
- name: trivy-get-binary | ||
image: aquasec/trivy:0.19.2 | ||
command: | ||
- cp | ||
- -v | ||
- /usr/local/bin/trivy | ||
- /var/starboard/trivy | ||
volumeMounts: | ||
- name: scan-volume | ||
mountPath: /var/starboard | ||
# The trivy-download-db container is using trivy executable binary | ||
# from the previous step to download Trivy vulnerability database | ||
# from GitHub releases page. | ||
# This won't be required once Trivy supports ClientServer mode | ||
# for the fs subcommand. | ||
- name: trivy-download-db | ||
image: aquasec/trivy:0.19.2 | ||
command: | ||
- /var/starboard/trivy | ||
- --download-db-only | ||
- --cache-dir | ||
- /var/starboard/trivy-db | ||
volumeMounts: | ||
- name: scan-volume | ||
mountPath: /var/starboard | ||
containers: | ||
# The nginx container is based on the container image that | ||
# we want to scan with Trivy. However, it has overwritten command (entrypoint) | ||
# to invoke trivy file system scan. The scan results are output to stdout | ||
# in JSON format, so we can parse them and store as VulnerabilityReport. | ||
- name: nginx | ||
image: example.registry.com/nginx:1.16 | ||
# To scan image layers cached on a cluster node without pulling | ||
# it from a remote registry. | ||
imagePullPolicy: Never | ||
securityContext: | ||
# Trivy must run as root, so we set UID here. | ||
runAsUser: 0 | ||
command: | ||
danielpacak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- /var/starboard/trivy | ||
- --cache-dir | ||
- /var/starboard/trivy-db | ||
- fs | ||
- --format | ||
- json | ||
- / | ||
volumeMounts: | ||
- name: scan-volume | ||
mountPath: /var/starboard | ||
``` | ||
|
||
Notice that the scan Job does not use registry credentials stored in the `private-registry` ImagePullSecret at all. | ||
Also, the `ImagePullPolicy` for the `nginx` container is set to `Never` to avoid pulling the image from the | ||
`example.registry.com/nginx` repository that requires authentication. And finally, the `nodeName` property is explicitly | ||
set to `kube-control-plane` to make sure that the scan Job is scheduled on the same node as a Pod controlled by the | ||
`nginx` Deployment. (We assumed that there was at least one Pod controlled by the `nginx` Deployment, and it was scheduled | ||
on the `kube-control-plane` node.) | ||
|
||
Trivy must run as root so the scan Job defined the `securityContext` with the `runAsUser` property set to `0` UID. | ||
|
||
## Remarks | ||
|
||
1. We cannot scan K8s workloads scaled down to 0 replicas because we cannot infer on which cluster node a scan Job should | ||
run. (In general, a node name is only set on a running Pod.) But once a workload is scaled up, Starboard Operator | ||
will receive the update event and will have another chance to scan it. | ||
2. It's hard to identify Pods managed by the CronJob controller, therefore we'll skip them. | ||
3. Trivy filesystem command does not work in [ClientServer] mode. Therefore, this solution is subject to the limits of | ||
the [Standalone] mode. We plan to extend Trivy filesystem command to work in ClientServer mode and improve the | ||
implementation of Starboard once it's available. | ||
4. Trivy must run as root and this may be blocked by some Admission Controllers such as PodSecurityPolicy. | ||
|
||
[Devendra Turkar]: https://github.com/deven0t | ||
[Daniel Pacak]: https://github.com/danielpacak | ||
[Standalone]: https://aquasecurity.github.io/starboard/v0.13.2/integrations/vulnerability-scanners/trivy/#standalone | ||
[ClientServer]: https://aquasecurity.github.io/starboard/v0.13.2/integrations/vulnerability-scanners/trivy/#clientserver | ||
[Configuring nodes to authenticate to a private registry]: https://kubernetes.io/docs/concepts/containers/images/#configuring-nodes-to-authenticate-to-a-private-registry | ||
[AWS ECR Private registry authentication]: https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could maybe be done with a csi driver as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that I understood this comment. Could you elaborate on how we can use csi driver in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just thinking, instead of copying the file from an init container to an emtpydir, a csi driver in ephemeral mode could mount the file into the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May not be worth the effort. but could be like: