Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial working draft of pluggable scanning proposal and working group #82

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 240 additions & 0 deletions proposals/new/Pluggable-Scanning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Proposal: `Pluggable Container Scanning`

Author: `Zach Hill (zhill)`

Discussion: [Issue 6234](https://github.com/goharbor/harbor/issues/6234)

## Abstract

Add a generic image scanner abstraction layer between Harbor and external image scanners. Scanner integrations are achieved
by implementing an adapter which implements the scanner adapter interface and which is loaded, configured, and invoked by the
a generic scanner job.

The adapter interface will support the following high-level capabilities:

1. Scanning images achieved by the scanner fetching image data and returning a vulnerability listing mapping vulnerabilities to artifacts in the image
2. Providing a image check pass/fail recommendation based on the specific capabilities/polices of the scanner (new)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Harbor already do this kind of policy checking? I would think this is better outsourced to a policy component (e.g. OPA), for separation of concerns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current harbor implementation includes an optional "policyCheck" api interceptor that as I understand it will block image pulls if the image fails the policy check. The current implementation of that check uses the last vuln scan result and uses a simple severity comparison.

My proposal (at least thus far) has intentionally minimal on the impact to the larger Harbor architecture and attempts to bring support for other scanners in with minimal changes to other aspects of the system. I agree there are broader changes possible to support policy-based access gating to images, and the proposal's addition of an explicit image check from the scanner would allow the scanner to provide an input to any such policy evaluation system build into Harbor itself, either by plugin of OPA or another system.

I think what you're suggesting makes sense but could be implemented independently of this proposal and this proposal's changes would fit into that work as an input to the evaluation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizrice

@zhill is right. Currently, harbor provides an interceptor to do the policy evaluation and checking work. They can be driven by some other tools like OPA in the feature (actually we have started to consider if we should introduce a unique and powerful policy engine to handle the policy related things). But that can be performed in another proposal. In this proposal, as @zhill mentioned, we should define a flexible interface to support the checking process, either by an interceptor or by the OPA, etc..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's a vulnerability list provided by scanner, the pass/fail is not really necessary from scanner?

Is this a case we think valuable when different scanners report different pass/fail recommendations under same conditions (e.g. same vulnerability list)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, a vulnerability listing is the very bare minimum of useful information about an image's content, and often not sufficient for production security practices. Clair provides a very simple scan capability, but others like Anchore and Aqua provide much deeper analysis and capabilities such as secret detection, provenance, configuration checks, and metadata checks on the image content which all factor into acceptance criteria.

As I said in comments previously, a scanner may or may not implement this and Harbor/its users are free to ignore/disable such capabilities as desired, but I think it is important to have support for more sophisticated scanners than just vulnerability mappings and returning lists.

I will certainly update the doc with more clarity on this position and the central idea behind including this function to facilitate a decision on inclusion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zhill . Our company purchased anchore enterprise because we wanted to detect the secret in any docker images as well; meanwhile, it also gives the cyber security team the ability to define some custom policy.

Copy link
Contributor

@reasonerjt reasonerjt Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cafeliker @zhill I see this is a valuable use case. That means we need to make Harbor's security policy flexible to support different attributes for different scanners?

3. Cleanup of data in the scanner when an image is deleted from Harbor (for image lifecycle completeness) (new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup is questionable as the scanner may very likely be shared by different services? if it has the knowledge that an image with digest sha256:xxxx is vulnerable, we may not need to remove it if this image is removed from Harbor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the logic can be complex, and I think that can be handled in the plugin configuration, since that has the plugin-specific context, and/or the overall configuration that can be deployment specific. The intent is simply to provide a facility to allow the scanner to be told that an image is no longer in Harbor and thus may be GC'd if the scanner does have some state for the image and if the user wants it to be kept in sync with the registry. Such behavior is optional and a specific plugin may choose to implement the method as a no-op.

Copy link
Contributor

@reasonerjt reasonerjt Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhill How about Harbor send a webhook to scanner? I hope the scanner adapter or plugin should focus on enabling Harbor to scan artifacts, cleanup the data on the scanner may be out of the scope?


## Background

Harbor contains a tightly-coupled interface to Clair for security scans of image content uploaded to the registry.
Some users have other container scanning solutions already, or would like to leverage a solution with other desirable
features without duplicating image scans. It will add convenience and value to Harbor deployments to allow such scanners
to provide results for Harbor in place of the default integration.

The current implementation for scanner support is included in the following components:

1. Job service's ClairJob job type for executing scans
2. DAO - Internal data model for storing and retrieving scan results
3. API for both viewing and triggering scans
5. API for receiving notifications from the scanner to trigger new scans
4. Initialization and configuration (including ClairDB init in the DAO layer)
5. policyChecker implementation consumes the vulnerability overview produced by the scanner to make access decisions

Scans are initiated by the job service due to:
* New image pushed
* Scheduled scan interval
* User API call
* Notification (POST) from Clair to _/service/notifications/clair_ indicating an update available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed.


The scan job (ClairJob) interacts with the scanner (Clair) and process the result for hand-off to the DAO layer to be persisted.
The persisted result (ImgScanOverview) is made available via the Harbor API as well as consumed by the policyChecker interceptor if configured.

The policyChecker interceptor consumes the scan result for comparison with the configured vulnerability-severity-based policy
and determines if specific HTTP requests are to be allowed.

The Harbor UI also consumes the scan results and visualizes them to the user as well as providing UI components to configure and execute scans.

## Proposal

Introduce a layer of abstraction between the scan invocation logic and the scanner-specific logic: an adapter interface for any scanner to be integrated with Harbor.
The existing Clair implementation (ClairJob, etc) will be moved into a Clair adapter invoked by the common scan.

A ScanJob layer will added that will load, configure, and invoke the appropriate adapter defined in the Harbor configuration.

The new ImageScannerAdapter interface provides:

1. Execute a scan on the image data, retrievable by the digest of the image manifest document (aka the "image digest") (same as current impl)
1. Notify the scanner that the image is removed from Harbor. Depending on scanner and configuration this may be a no-op. (new)
Copy link
Contributor

@reasonerjt reasonerjt Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that the adapter should be minimal and focus on the general workflow for all scanners.

So this is probably not part of it.

1. Execute a rescan of an image previously scanned (Note: For some scanners this will be the same as #1, but for others it may differ in what/if data is retrieved.) (new, but largely identical to current impl)
1. Execute a "check" operation for an image against the scanner, if implemented, to provide an evaluation and pass/fail (new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the result of "check" part of the scan result?

So that the adapter will only support two actions "TriggerScan" and "GetResult".


Copy link
Contributor

@reasonerjt reasonerjt Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scannerAdapter may also need to provide capability to describe itself, to help Harbor consume the heterogeneous scan result? Or is can be encapsulate in scan_result_metadata ?

### Accessing Image Data

The adapter interface passes a reference to the image data to the adapters: the image digest (digest of the image manifest object) and the registry url
necessary to retrieve the data via registry API mechanisms. An example reference would be: internalharborregistry:5000/project/image@sha256:abc123.
From such an reference the scanners may use HTTP GET to download the image manifest that describes the additional data layers needed for analysis. Optimization of
the data retrieval and analysis is left to the individual adapters and scanner implementations as an implementation detail.

The current implementation of the scanner integration assumes that the scanner has access to the internal registry API without first passing thru the proxy component
that implements access control for external Harbor users. This allows the scanners to fetch image data even when external users cannot retrieve image data due to
access restriction caused by the scan status and/or vulnerability status of the image as configured by the API policyChecker component. As such, either a given
scanner to be integrated with Harbor must have access to internal network interfaces not exposed to external users, or else must implement a proxy that can run
within that network that is exposed externally (presumably with authc/authn) to ensure access to the data regardless of the access status for Harbor users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement that via "service account"?


As an example, the current implementation runs Clair in the same docker-compose network so it can access ports that are not exposed to the host itself. Such a model
will work for some scanners, but will require additional implementation work for adapters for scanners that are external to the registry deployment (e.g. Saas or externally hosted to the registry).

#### Adapter Interface Operations

* ScanImage
* Description: The base scan operation to process an image's content and return a list of found vulnerabilities, some scan metadata, and a check result (scanner specific)
* Params: image digest, registry URL, pull credential, tags applied to the digest
* Output:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Harbor use Clair's representation of vulnerability info at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, yes. It saves the listing effectively as-is from Clair, but also creates a summary with counts for each severity category that is used by things like visualization in the UI and the API proxy's policyCheck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not have to return the scan result, the scan may be async, so how about returning an uri containing all the information to fetch the scan result?

scan_status: boolean
vulnerabilities: array of vulnerable artifacts in the image
scan_metadata: json object, scanner specific
image_check: boolean (true if image is 'ok' according to scanner implementation and/or policy)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Harbor currently have this kind of policy decision built in? I am hesitant about it partly because of separation of concerns ("do one thing well") but also because image check policy might be context-dependent. For example,

  • the policy might be stricter for a production cluster than for a test cluster
  • different teams in an organization might have different requirements around (say) permissible base images, or the vulnerabilities they are allowed to whitelist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field simply to provides a facility for a scanner to indicate a recommendation. What Harbor does with it is open for discussion and mostly for future work. The check result is simply an evaluation of the image content since that is the only context present for a scanner in the base case. I can definitely see a case where the actual admission decision for an image download by a user is based on identity attributes, properties of the project (the namespace for repositories in Harbor), and other user-configurable metadata.

A more robust policy system is an interesting and certainly useful addition, but I think falls outside the scope of this proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizrice

A snapshot of Harbor vulnerability checking which designed as one of the project security settings:

Screen Shot 2019-06-17 at 13 49 26

For different environments, you can set different severity levels restriction.

About CVE whitelist, it has been planned to deliver in the next release V1.9 (probably released at end of Aug. ).


* NotifyImageDeleted
* Description: Explicit call to notify scanner that a specific image manifest has been deleted from Harbor and the scanner may flush any state if necessary.
* Params: image digest, tags, registry URL
* Output: status code (ok or error)

* RescanImage (since some scanners may need to treat this differently)
* Description: functionally the same as the Scan Image operation, but may occur at different lifecycle events for an image and provided here to support scanners which need to differentiate between a new image scan and a re-scan.
* Params: image digest, registry URL, pull credential, tags applied to the digest
* Output:
scan_status: boolean
vulnerabilities: array of vulnerable artifacts in the image
scan_metadata: json object, scanner specific
image_check: boolean (true if image is 'ok' according to scanner implementation and/or policy)

* HandleScannerNotification
* Description: Handler for webhooks invoked by the scanner itself. The adapter, with knowledge of the payload, can decide if a new scan task is needed and indicate it in the return value
* Params: notification content
* Output: array of digests to re-scan

* CheckImage
* Description: A scanner-specific synchronous check of an image. Provided for optional use in interceptors or other cases where Harbor may want an up-to-date check result from the scanner to make an admission decision.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this admission from the perspective of "let this image be stored in the registry" or "let this image be deployed to the cluster"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current Harbor implementation is to block API requests in the API proxy component, and those requests are for a download of an image manifest (initiating an image pull) rather than the push of the image. The checks are configurable on a per-project basis in Harbor and also include things like Notary signature checks. See: https://github.com/goharbor/harbor/blob/master/src/core/proxy/interceptors.go

* Params: image digest, image reference (e.g. pullstring if a tag-based manifest fetch)
* Output: check result (boolean)

#### Configuration

* adapter selection - configured in the Harbor configuration itself
* adapter configuration, as a json document, specific to the adapter impl. Config key is the adapter id/name.
* Scanner credentials - included in the adapter-specific configuration if necessary. Should be kept in a secret store in encrypted form.
* Details tbd with feedback from harbor maintainers on overall configuration handling

#### Error Handling

Adapter implementations of these operations should return errors in standard go mechanisms (not outlined here since the are common to all function calls). However, the objects returned
as errors, should be well defined by the adapter and implement a common interface:

AdapterError an error normalization abstraction between the adapter framework and adapter implementations:
* Message - string - description of the error, intended to be short for single line presentation
* Detail - string - Additional detail for human consumption that may be longer than a single line and may include remediation information.
* ErrorCode - int - code to uniquely identify the class of error
* CanRetry - bool - an indicator to the caller that the error may be resolved with a retry
a
#### DAO Updates needed:

Overall, the objective is to require as little DAO change as possible to minimize upgrade and db maintenance impact.

* Extend the existing result store to include json fields for:
* Augmentation of existing vulnerability listing to be more generic (primarily column/table naming, not schema).
* Scan result metadata from scanner (opaque json object)
* Image check result - a boolean and a json object returned by the scanner

## Non-Goals

* Changes to the UI presentation of image vulnerability data (such changes may be a goal of a later work or independent work parallel to this)
* More than one scanner configured for use concurrently in Harbor

## Rationale

The proposed approach is intended to provide a generic-enough interface that scanners with different data management and lifecycle designs can still be integrated into the
Harbor and implement only the aspects necessary for that system in the adapter. The basic vulnerability listing format is kept normalized so that clients of the Harbor API have
a well-defined format for consuming scan results, but support for additional scanner-specific metadata will allow scanner-aware clients to gain additional insight into the
scan results/details as available for a specific scanner.


This proposal includes 2 new capabilities not found in the current Clair-specific implementation:
1. Full data lifecycle management
1. The current implementation assumes that the admin is manually managing the data lifecycle for scan state in the scanners. That is certainly always an option but
quickly imposes some operational issues with resource provisioning of the scanner (db storage sizes, etc).
1. By including a notification mechanism that an image is logically deleted, the scanner may, based on its configuration and how the user wants it deployed, be able
to clean up its own scan state and maintain some parity with the image state of the Harbor deployment. This is purely a notification mechanism, not an actuation, and the scanner adapters
are free to implement as makes sense, including a no-op.

1. Scanner-implemented image evaluation with a policy recommendation
1. The addition of an image check function allows scanners with more functionality than just returning simple vulnerability lists to make recommendations
to Harbor on suitability of an image. For end-users, the raw vulnerability list is rarely sufficient for determining if an image is acceptable or not, but
the mechanisms and inputs to the check decision will be scanner specific and must be handled by the adapter implementations themselves. Additional future work in the
policyChecker or new interceptors can leverage the check recommendations from the scanners or utilize the new CheckImage operation to have control over the staleness of
the check result. This check is not intended to replace other policy evaluations but to augment them.
1. This functionality could be abstracted into a separate Policy adapter layer in the future if Harbor intends to build a generic policy layer. However, this proposal attempts to avoid significant changes outside of the image scan flow.

## Compatibility

TBD

## Implementation

Overall approach is to add an abstraction layer, then extend it

### Phase 1: Abstract existing Clair implementation into the adapter model

Add abstraction layer between existing ScanJob logic and the existing ClairJob interface.

Example interfaces:

```

// The interface that each scanner implementation would implement
// The ScannerAdapter objects should have default constructors to facilitate lazy initialization

type ScannerAdapter interface {
func configure(conf map[string]interface{}) err
func scanImage(imgReference ImageReference, credentials AccessCredentials) *VulnerabilityReport, *ImageCheckReport, err
func rescanImage(imgReference ImageReference, credentials AccessCredentials) *VulnerabilityReport, *ImageCheckReport, err
func notifyImageDeletion(imgReference ImageReference) err
func checkImage(imgReference ImageReference) PolicyResult, err
}

type ScanResult struct {
vulnResult VulnerabilityReport
checkResult ImageCheckReport
}

type ImageReference struct {
registryUrl string
digest string
tag string
}

// Access credentials for the image content
type AccessCredentials struct {
token string
tokenUrl string
}

// For now, use the existing vuln format for ease of conversion into the db model
type VulnerabilityReport ClairVulnerabilityEnvelope

// Optionally implemented,
type ImageCheckReport struct {
imageDigest string
imagePasses boolean //Pass/Fail flag
checkTimestamp time.Time
details map[string]interface{} // Opaque json object for scanner-specific results if implemented by the scanner
}

```

### Phase 2: Add one or more non-Clair adapters
Likely the Anchore adapter and Aqua Microscanner adapters.


### Phase 3: Add new image data lifecycle capabilities

Specifically, the delete lifecycle call, invoked in the proxy handler on a successful response from the backing registry implementation.



## Open issues (if applicable)

General approach to adapters in Harbor. Expectation is that they will all be in-tree and compiled into the build, but selected at runtime based on configuration, but this is open for discussion as it will
impact the libraries and licenses of software that the adapters themselves as well as software delivery implications.