diff --git a/proposals/artifact-processor-extender.md b/proposals/artifact-processor-extender.md
index cd592561..f3665ff0 100644
--- a/proposals/artifact-processor-extender.md
+++ b/proposals/artifact-processor-extender.md
@@ -1,33 +1,178 @@
+
+
+**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Proposal: `Artifact Processor Extender`](#proposal-artifact-processor-extender)
+ - [Abstract](#abstract)
+ - [Background](#background)
+ - [Motivation](#motivation)
+ - [Goals](#goals)
+ - [Non-Goals](#non-goals)
+ - [Implementation](#implementation)
+ - [HTTPProcessor and Processor Extender](#httpprocessor-and-processor-extender)
+ - [`HTTPProcessor`](#httpprocessor)
+ - [Processor Extender](#processor-extender)
+ - [Configuration file `processors.yaml`](#configuration-file-processorsyaml)
+ - [Artifact Data Access](#artifact-data-access)
+ - [Policy Check Interceptor](#policy-check-interceptor)
+ - [OAuth 2 Bearer Tokens](#oauth-2-bearer-tokens)
+ - [Robot Accounts](#robot-accounts)
+ - [Development Process](#development-process)
+ - [First Iteration: HTTPProcessor and Extender without Auth](#first-iteration-httpprocessor-and-extender-without-auth)
+ - [Second Iteration: Registration](#second-iteration-registration)
+ - [Third Iteration: Auth in the Processor Extender](#third-iteration-auth-in-the-processor-extender)
+
+
+
# Proposal: `Artifact Processor Extender`
-Author: `Ce Gao @gaocegege, Jian Zhu @zhujian7, Yiyang Huang @hyy0322`
+Author:
+
+- Yiyang Huang [@hyy0322](https://github.com/hyy0322) \
+
+ Fig. 1 Current Design of Harbor Artifact Processor
+ +
Fig. 2 More Harbor Artifact Processor in Harbor Core
+ + +Besides this, there will be more proprietary artifact types in industries, just like Kubernetes CRDs, as shown in Fig. 3. Each artifact vendor has to maintain their own fork to keep their proprietary artifact types, which may make Harbor a fragmented platform. + ++ +
Fig. 3 Fragmented Problems in Harbor
+ + +## Goals + +This proposal is to: + +- Design the new processor to extend artifact types in runtime. +- Keep non-invasive to the current built-in processors, at the same time. + +## Non-Goals + +This proposal is not to: + +- Support whitelist for artifact types. [goharbor/harbor#12061](https://github.com/goharbor/harbor/issues/12061) + +## Implementation + +To address these problems, we propose a new feature **artifact processor extender** in Harbor Core. Some contributions have been made in this proposal: + +- The new Processor struct `HTTPProcessor` to support artifact processor extender feature for extending custom artifact types. The current design of `Processor` interface is not changed at the same time, thus the new feature will not affect the existing supported types like OCI Image, CNAB and Helm Chart. + +- The new configuration file `processors.yaml` to register artifact types with processors in runtime. + +- The similar mechanism to Scanner to support Auth in `HTTPProcessor`, which will be used to pull manifests from the Registry. + + +### HTTPProcessor and Processor Extender + +The Processor interface is defined in Harbor Core and we do not propose any change for it. + +```go // Processor processes specified artifact type Processor interface { // GetArtifactType returns the type of one kind of artifact specified by media type @@ -44,17 +189,21 @@ type Processor interface { } ``` -```Registry``` is defined to store ```Processor```. +We propose a new implementation of the `Processor` interface, `HTTPProcessor`. The design of the processor is shown in Fig. 4. The processor acts as a proxy to the processor extender. There are two new components in the architecture: -``` -var ( - // Registry for registered artifact processors - Registry = map[string]Processor{} -) -``` +1. Remote Processor API - HTTP RESTful API between Harbor and remote processor. + - The API itself is defined and maintained by Harbor. + - Authentication specifics are out-of-scope, but should be supported using the HTTP ```Authorization``` header. +2. Remote Processor Extender - Long-running RESTful service that implements the Remote Processor API to extract artifact data. + - The extender is deployed outside Harbor Core. + - The extender has independent configuration management, and deployment lifecycle. + ++ +
Fig. 4 Design of HTTPProcessor
+ -#### Remote Processor API -For a remote processor, the functions defined in ```Processor``` interface can be abstract to HTTP service API. By using these API, harbor core can call remote HTTP processor. +For a remote processor, the functions defined in ```Processor``` interface can be abstract to HTTP service API. By using these APIs, Harbor Core can communicate with remote HTTP processor extender. ``` GET {remote-processor-endpoint}/artifacttype @@ -63,9 +212,11 @@ POST {remote-processor-endpoint}/abstractmetadata POST {remote-processor-endpoint}/abstractaddition ``` -```HTTPProcessor``` is a ```Processor``` implement which make harbor have extensibility to let users use remote HTTP service process their user defined artifacts by API defined above. +#### `HTTPProcessor` -``` +`HTTPProcessor` makes harbor more extensible to allow users using their own processor extender to process the custom artifacts. The `HTTPProcessor` acts as a proxy to the extender. The pseudo code is here. + +```go type HTTPProcessor struct { MediaType string ProcessorURL string @@ -101,44 +252,52 @@ func (h *HTTPProcessor) AbstractAddition(ctx context.Context, artifact *artifact } ``` -#### Register -Harbor now using ```app.conf``` to set core config. The configuration info is about core service configuration used for beego. So we can use another configration file just for processor configration info. -Considering defining a specific type of artifact is not frequent behaviour, there is no need for harbor to expose a API for remote processor to register. So it is a simple way to use a yaml file named ```processor.yaml``` mount in core service to register ```processor``` info when core service start. - -``` -Processors: -- ProcessorUrl: "http://{processor-service-IP}:port" - ArtifactMediaType: "{media-type-string}" -``` - -#### Artifact Data Access -Refer to [Artifact Data Access](https://github.com/goharbor/community/blob/master/proposals/pluggable-image-vulnerability-scanning_proposal.md#artifact-data-access), there are same problems for remote processor extracting artifact data possibly. - -It is possible that when remote processor extracting artifact data, the remote processor still need to retrive data from harbor using the Docker Registry v2 API exposed by Harbor. So remote processor need credentials provided by harbor when API provided by remote processor called by harbor. +The workflow of pushing a custom artifact to Harbor with the help of `HTTPProcessor` is shown in Fig. 5. The Harbor works as a proxy to the registry when the user uploads the content layers and config layer. Harbor ensures that the repository exists. Then Harbor puts the manifest to the registry. After that, Harbor will check if the artifact by digest exists. In this step, Harbor will use `Processor.AbstractMetatda` in `Abstractor` to abstract the metadata ant keep in the artifact.Artifact model. -##### Policy Check Interceptor -Harbor can block image distribution based on severity of vulnerabilities found during scan. Since repote processor is deployed outside the system boundary of harbor, the docker clients used by remote processor are supposed to access the registry through external IP configured by ingress or load balancer. Because of the policy check interceptor, there is a problem of accessing registry via external endpoint which might block pulling. ++ +
Fig. 5 Workflow of Pushing an Artifact using the HTTPProcessor
+ -##### OAuth 2 Bearer Tokens -Harbor provides a JWT Bearer token to Clair on scan request. The token is generated in OAuth Client Credentials (with client_id and client_secret) flow and then passed directly to Clair in a HTTP POST request to scan a Clair Layer. +When `HTTPProcessor.AbstractMetadata(ctx context.Context, manifest []byte, artifact *artifact.Artifact) error)` is invoked, it will send a HTTP POST request to the processor extender: ``` -Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw +POST {remote-processor-endpoint}/abstractmetadata +{ + "registry": { + // A base URL of the Docker Registry v2 API. + "url": "registry:5000", + // For example, `Basic: Base64(username:password)`. + "authorization": "hsakjh..." + }, + "manifest": "{\"config\":...}", + "artifact": artifact.Artifact{} +} ``` -Clair, on the other hand, is using the token to pull image layers from Harbor registry. It works because Clair is using a standard ```http``` library and sets a ```Authorization``` header programmatically. - -In order to enable Scanner Adapters to bypass Policy Check Interceptor, Harbor's authentication service will generate a dedicated JWT access token and hand it over to the underlying Scanner thru Scanner Adapter in a ScanRequest. +The `manifest` and `artifact` will be sent to the extender to abstract the metadata. If the registry needs Auth, `registry` will be sent, just like Scanner. -It is reasonable to use the same way for remote processor using bearer tokens to access to the image data from harbor registry. +When users deal with these built-in artifact types like OCI Image, CNAB or Helm Chart, the workflow **is not affected**, like Fig. 6. For example, the user uploads a Helm Chart to Harbor, we will use Helm Chart processor to abstract the metadata. Thus the design of `HTTPProcessor` is non-invasive to the current design. -##### Robot Accounts -Refer to scan job using credentials generated by robot account mechanism, we can use the same way to use the robot account mechanism to generate credentials that work with these common OCI/Docker tooling libraries to provide credentialed access to the image data. The lifecycle of the robot account credentials can be bound to the HTTP request. For every HTTP request call remote processor API, a robot account expired at certain time will be created.Additionally, a modification is needed to ensure that the generated credentials have access to bypass the configured policy checks on the image that normal users are subject to if those checks are configured. ++ +
Fig. 6 Workflow of Pushing an Artifact using the Build-in Processor
+ +#### Processor Extender -#### Remote Processor API define +We have four RESTful APIs provided by the extender. ``` +GET {remote-processor-endpoint}/artifacttype +GET {remote-processor-endpoint}/additiontypes +POST {remote-processor-endpoint}/abstractmetadata +POST {remote-processor-endpoint}/abstractaddition +``` + +The data structure used by the extender is shown here: + +```go // Registry represents Registry connection settings. type Registry struct { // A base URL of the Docker Registry v2 API exposed by Harbor. @@ -188,9 +347,10 @@ type AbstractAdditionResponse struct { } ``` -#### Remote Processor -A user defined processor need to build a HTTP service which implement HTTP processor API -``` + +A processor extender needs to build a HTTP service which provides these four APIs. + +```go func GetArtifactType() *ListAdditionTypesResponse { return &ListAdditionTypesResponse{} } @@ -208,34 +368,73 @@ func AbstractAddition(req *AbstractAdditionRequest) *AbstractAdditionResponse { } ``` -## Develop Plan +### Configuration file `processors.yaml` -There are totally three things we need to do to complete the proposal -- implement ```HTTPProcessor``` -- register ```HTTPProcessor``` to harbor core -- authentication problems +```app.conf``` is used to configure Harbor Core. The configuration info is about core service configuration used for beego. So we can use another configuration file for processor configuration info. -1. At the first stage, we will implement the ```HTTPProcessor```. -At this stage, user defined processor will not register to harbor. So if users want to use remote porcessor, they still need to add registeration logic to harbor code and repcompile harbor core. Also, ```HTTPProcessor``` will make HTTP request to remote processor without privide authentication and Harbor external endpoint. So users need to do some work to generate authentication using other user account. Harbor external endpoint should be configured any way. And policy check interceptor can not be bypassed. -2. At the second stage, registration logic will be added. Users don't need to modify harbor code any more. A remote processor configration file is required to register specific processor to harbor. When harbor core start, it will read the configuration file and register the processor to harbor. -3. At the final stage, using robot account mechanism to generate credentials will be finished. Harbor external endpoint and authentication will be passed directy in HTTP POST request body. Users don't need to consider about the authentication problem. But still need to find a way to use authentication properly. +Considering defining a specific type of artifact is not frequent, there is no need for Harbor to expose a API for remote processor to register. It is simple to use a YAML file named ```processor.yaml```, which will be mounted into core service to register ```processor``` info when core service start. -## Non-Goals +```yaml +Processors: +- ProcessorUrl: "http://{processor-service-IP}:port" + ArtifactMediaType: "{media-type-string}" +``` -[Anything explicitly not covered by the proposed change.] +The registration will look like the pseudo code here: -## Rationale +```go +Registry = map[string]Processor{} +Registry[Processors[].ArtifactMediaType] = HTTPProcessor{ProcessorURL: Processors[].ProcessorURL} +``` -[A discussion of alternate approaches and the trade offs, advantages, and disadvantages of the specified approach.] +### Artifact Data Access -## Compatibility +Refer to [Artifact Data Access](https://github.com/goharbor/community/blob/master/proposals/pluggable-image-vulnerability-scanning_proposal.md#artifact-data-access), there are same problems for remote processor extracting artifact data possibly. -[A discussion of any compatibility issues that need to be considered] +It is possible that when remote processor extracts artifact data, the remote processor still need to retrieve data from Harbor using the Docker Registry v2 API exposed by Harbor. So remote processor need credentials provided by Harbor when API provided by remote processor called by Harbor. -## Implementation +#### Policy Check Interceptor + +Harbor can block image distribution based on severity of vulnerabilities found during scan. Since processor extender is deployed outside the system boundary of harbor, the client used by processor extender is supposed to access the registry through external IP configured by ingress or load balancer. Because of the policy check interceptor, there is a problem of accessing registry via external endpoint which might block pulling. + +#### OAuth 2 Bearer Tokens + +Harbor provides a JWT Bearer token to Clair in the request. The token is generated in OAuth Client Credentials (with client_id and client_secret) flow and then passed directly to Clair in a HTTP POST request to scan a Clair Layer. + +``` +Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw +``` + +Clair, on the other hand, is using the token to pull image layers from Harbor registry. It works because Clair is using a standard ```http``` library and sets a ```Authorization``` header programmatically. + +In order to enable Scanner Adapters to bypass Policy Check Interceptor, Harbor's authentication service will generate a dedicated JWT access token and hand it over to the underlying Scanner thru Scanner Adapter in a ScanRequest. + +It is reasonable to use the same way for processor extender using bearer tokens to access to the image data from harbor registry. + +#### Robot Accounts + +Refer to scan job using credentials generated by robot account mechanism, we can use the same way to use the robot account mechanism to generate credentials that work with these common OCI/Docker tooling libraries to provide credentialed access to the image data. The lifecycle of the robot account credentials can be bound to the HTTP request. For every HTTP request call remote processor API, a robot account expired at certain time will be created.Additionally, a modification is needed to ensure that the generated credentials have access to bypass the configured policy checks on the image that normal users are subject to if those checks are configured. + +## Development Process + +There are totally three things we need to do to complete the proposal: + +- Implement ```HTTPProcessor``` +- Register ```HTTPProcessor``` to Harbor Core using `processors.yaml` +- Implement the mechanism to support Auth. + +Thus we propose to have three iterations. Each of them is self-contained and supposed to be merged into Harbor Core. + +### First Iteration: HTTPProcessor and Extender without Auth + +At this stage, user defined processor will not register to harbor. So if users want to use processor extender, they still need to hard-code some logic to harbor code to register the type with the corresponding processor manually. + +Also, ```HTTPProcessor``` will make HTTP request to the extender without providing authentication and Harbor external endpoint. So users need to do some work to generate authentication using other user account. Harbor external endpoint should be configured any way. And policy check interceptor can not be bypassed. + +### Second Iteration: Registration -[A description of the steps in the implementation, who will do them, and when.] +At the second stage, registration logic will be added. Users don't need to modify harbor code any more. A remote processor configuration file is required to register specific processor to harbor. When harbor core starts, it will read the configuration file and register the processor to harbor. -## Open issues (if applicable) +### Third Iteration: Auth in the Processor Extender -[A discussion of issues relating to this proposal for which the author does not know the solution. This section may be omitted if there are none.] +At the final stage, using robot account mechanism to generate credentials will be finished. Harbor external endpoint and authentication will be passed directly in HTTP POST request body. Users don't need to worry about the authentication problem. But still need to find a way to use authentication properly. diff --git a/proposals/images/artifact-processor-extender/arch.png b/proposals/images/artifact-processor-extender/arch.png new file mode 100644 index 00000000..c13ae6ae Binary files /dev/null and b/proposals/images/artifact-processor-extender/arch.png differ diff --git a/proposals/images/artifact-processor-extender/current-design.png b/proposals/images/artifact-processor-extender/current-design.png new file mode 100644 index 00000000..2c8f3286 Binary files /dev/null and b/proposals/images/artifact-processor-extender/current-design.png differ diff --git a/proposals/images/artifact-processor-extender/extend-problem-1.png b/proposals/images/artifact-processor-extender/extend-problem-1.png new file mode 100644 index 00000000..3eb2edeb Binary files /dev/null and b/proposals/images/artifact-processor-extender/extend-problem-1.png differ diff --git a/proposals/images/artifact-processor-extender/extend-problem-2.png b/proposals/images/artifact-processor-extender/extend-problem-2.png new file mode 100644 index 00000000..618fd4a1 Binary files /dev/null and b/proposals/images/artifact-processor-extender/extend-problem-2.png differ diff --git a/proposals/images/artifact-processor-extender/workflow-new.png b/proposals/images/artifact-processor-extender/workflow-new.png new file mode 100644 index 00000000..da74440a Binary files /dev/null and b/proposals/images/artifact-processor-extender/workflow-new.png differ diff --git a/proposals/images/artifact-processor-extender/workflow.png b/proposals/images/artifact-processor-extender/workflow.png new file mode 100644 index 00000000..f618af12 Binary files /dev/null and b/proposals/images/artifact-processor-extender/workflow.png differ