Table of Contents generated with DocToc
Author:
- Yiyang Huang @hyy0322 <huangyiyang.huangyy@bytedance.com> (Corresponding Author)
- Ce Gao @gaocegege
- Jian Zhu @zhujian7
Links:
- Discussion: goharbor/harbor#12013
- Slides: Feature Request: Harbor Artifact Processor Extender
Status:
Harbor v2.0 makes Harbor the first OCI-compliant open source registry capable of storing a multitude of cloud-native artifacts like container images, Helm charts, OPAs, Singularity, and much more. It found strong demand on extending artifact types to support more complex scenarios. But the artifact authors now have to implement the processing logic in Harbor Core, which is not extensible.
The current design might go against the adoption of Harbor in industries since there are always proprietary artifact types. Thus we design a Harbor-specific schema in artifact manifest annotations and enhance the default processor to make it have ability to parse the custom artifact config.
There are four types of artifacts, which are image, helm v3, CNAB, OPA bundle, supported by Harbor. Each of them implements its processor to abstract metadata into artifact model defined by harbor. If users want to define a new kind of artifact, they have to implement the processor logic in Harbor Core service, which greatly limits the scalability and extensibility of Harbor.
When we use Harbor to store custom artifacts, we cannot get the expected result from the API provided by Harbor {}/api/v2.0/projects/{}/repositories/{}/artifacts/{}
. For example, we store a new artifact using caicloud/ormb, which is a OCI artifact specification for Machine Learning models, we get the result from the API:
{
"digest": "sha256:123aa..",
"id": 2,
"manifest_media_type": "application/vnd.oci.image.manifest.v1+json",
"media_type": "application/vnd.caicloud.model.config.v1alpha1+json",
"project_id": 2,
"repository_id": 1,
"size": 12927980,
"tags": [
{
"artifact_id": 2,
"id": 2,
"immutable": false,
"name": "v1",
"pull_time": "0001-01-01T00:00:00.000Z",
"push_time": "2020-05-15T04:04:19.516Z",
"repository_id": 2,
"signed": false
}
],
"type": "MODEL"
}
But when we store the Helm Chart and send request to the same API, we get more attributes. The extra attributes store the content of the config layer of the Helm Chart. Thus we can think the result is self-contained.
{
+ "extra_attrs": {
+ "apiVersion": "v1",
+ "appVersion": "0.8.0",
+ "description": "Host your own Helm Chart Repository",
+ "home": "https://github.com/helm/chartmuseum",
+ "icon": "https://raw.githubusercontent.com/helm/chartmuseum/master/logo2.png",
+ "keywords": [
+ "chartmuseum",
+ ],
+ "maintainers": [
+ {
+ "email": "opensource@codefresh.io",
+ "name": "codefresh-io"
+ }
+ ],
+ "name": "chartmuseum",
+ "version": "1.8.2"
+ },
"digest": "sha256:123aa..",
"id": 1,
"manifest_media_type": "application/vnd.oci.image.manifest.v1+json",
"media_type": "application/vnd.cncf.helm.config.v1+json",
"project_id": 2,
"repository_id": 2,
"size": 12927980,
"tags": [
{
"artifact_id": 1,
"id": 1,
"immutable": false,
"name": "v1",
"pull_time": "0001-01-01T00:00:00.000Z",
"push_time": "2020-05-15T04:04:19.516Z",
"repository_id": 2,
"signed": false
}
],
"type": "CHART"
}
The self-contained response is also necessary for these user-defined artifact types. Or we cannot use Harbor directly for most scenarios. The extra_attrs
field is processed by the Helm Chart processor, which is an implementation of artifact processor interface Processor
.
The current design of the artifact processor is shown in Fig. 1. Processor
interface is defined in Harbor Core, and there are four implementations for different types which embeds base.IndexProcessor
and base.ManifestProcessor
.
Fig. 1 Current Design of Harbor Artifact Processor
When artifact authors extend the artifact types, they implement corresponding processor logic in Harbor Core, as shown in Fig. 2. For example, there will be four new processor implementations in Harbor Core with at least four different maintainers from different communities if we want to support these four artifact types.
Fig. 2 More Harbor Artifact Processor in Harbor Core
Besides this, there will be more proprietary artifact types in industries, just like Kubernetes CRDs, as shown in Fig. 3. Each artifact vendor has to maintain their own fork to keep their proprietary artifact types, which may make Harbor a fragmented platform.
Fig. 3 Fragmented Problems in Harbor
Term | Definition |
---|---|
Manifest | OCI Image Manifest |
This proposal is to:
- Define Harbor-specific schema to let default processor process user defined artifact.
- Keep non-invasive to the current built-in processors, at the same time.
This proposal is not to:
- Support allowlist for artifact types. goharbor/harbor#12061
To address these problems, we propose a new feature Enhanced default processor in Harbor Core. The contributions of the proposal are:
- The Harbor-specific schema in artifact manifest to tell Harbor core more information about the user-defined artifacts.
- The enhanced default processor implementation to support user-defined artifacts.
We introduce a harbor-specific configuration in the manifest annotations, which contains information that Harbor core needs to understand the artifact.
The harbor-specific configuration follows the style of OCI Pre-Defined Annotation Keys.
The annotation key must follow convention which is defined like this io.goharbor.artifact.{annotation-version}.{key}
.
There is one proposed key in manifest.config.annotations
:
- io.goharbor.artifact.v1alpha1.skip-list The list of skip keys. Harbor will ignore these keys in configuration. The value for this key should be type string separated by comma.
There is one key in manifest.layers[].annotations
which is used to support icons:
- io.goharbor.artifact.v1alpha1.icon The identifier of artifact icon. The value for this key should be empty string. Only key will be processed, the value will not be used.
Here is an example.
{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.caicloud.model.config.v1alpha1+json",
"digest": "sha256:be948daf0e22f264ea70b713ea0db35050ae659c185706aa2fad74834455fe8c",
"size": 187,
"annotations": {
"io.goharbor.artifact.v1alpha1.skip-list": "metrics,git"
}
},
"layers": [
{
"mediaType": "image/png",
"digest": "sha256:d923b93eadde0af5c639a972710a4d919066aba5d0dfbf4b9385099f70272da0",
"size": 166015,
"annotations": {
"io.goharbor.artifact.v1alpha1.icon": ""
}
},
{
"mediaType": "application/tar+gzip",
"digest": "sha256:d923b93eadde0af5c639a972710a4d919066aba5d0dfbf4b9385099f70272da0",
"size": 166015
}
]
}
We propose to unify arguments for all methods in Processor interface:
// Processor processes specified artifact
type Processor interface {
// GetArtifactType returns the type of one kind of artifact specified by media type
- GetArtifactType() string
+ GetArtifactType(ctx context.Context, artifact *artifact.Artifact) string
// ListAdditionTypes returns the supported addition types of one kind of artifact specified by media type
- ListAdditionTypes() []string
+ ListAdditionTypes(ctx context.Context, artifact *artifact.Artifact) []string
// AbstractMetadata abstracts the metadata for the specific artifact type into the artifact model,
// the metadata can be got from the manifest or other layers referenced by the manifest.
- AbstractMetadata(ctx context.Context, manifest []byte, artifact *artifact.Artifact) error
+ AbstractMetadata(ctx context.Context, artifact *artifact.Artifact, manifest []byte) error
// AbstractAddition abstracts the addition of the artifact.
// The additions are different for different artifacts:
// build history for image; values.yaml, readme and dependencies for chart, etc
AbstractAddition(ctx context.Context, artifact *artifact.Artifact, additionType string) (addition *Addition, err error)
}
The pseudo code of the defaultProcessor
is here:
func (d *defaultProcessor) GetArtifactType(ctx context.Context, artifact *artifact.Artifact) string {
// try to parse the type from the media type
strs := artifactTypeRegExp.FindStringSubmatch(d.mediaType)
if len(strs) == 2 {
return strings.ToUpper(strs[1])
}
// can not get the artifact type from the media type, return unknown
return ArtifactTypeUnknown
}
func (d *defaultProcessor) AbstractMetadata(ctx context.Context, artifact *artifact.Artifact, manifest []byte) error {
configLayer := PullBlob(artifact.RepositoryName, manifest.Config.Digest)
// Extract the extra attributes according to the annotation.
annotationParser := annotation.NewParser()
err = annotationParser.Parse(ctx, artifact, manifest)
return
}
As is described above, the annotation key follow io.goharbor.artifact.{annotation-version}.{key}
convention will be parsed by Harbor.
For every annotation key, there is annotation-version bind with it. In order to parse annotation keys for different versions, different annotation parser for different version is needed. So we can abstract an interface for annotation parser.
// Parser parses annotations in artifact manifest
type Parser interface {
// Parse parses annotations in artifact manifest, abstracts data from artifact config layer into the artifact model
Parse(ctx context.Context, artifact *artifact.Artifact, manifest []byte) (err error)
}
For different versions of parser, there is a specific annotation parser which implements Parser interface.
type v1alpha1Parser struct {
regCli reg.Client
}
func (p *v1alpha1Parser) Parse(ctx context.Context, artifact *artifact.Artifact, manifest []byte) error {
// parse annotation in a specific way
}
All parsers will register to parser registry.
// registry for registered annotation parsers
registry = map[string]Parser{}
With different versions of parsers, there is a wrapper annotation parser which also implements Parse interface, will use parsers in parser registry one by one in order to parser different version of annotations.
type parser struct{}
func (p *parser) Parse(ctx context.Context, artifact *artifact.Artifact, manifest []byte) (err error) {
for _, annotationVersion := range sortedAnnotationVersionList {
err = GetAnnotationParser(annotationVersion).Parse(ctx, artifact, manifest)
if err != nil {
return err
}
}
return nil
}
Fig. 4 Workflow of Pushing an Artifact using the Default Processor
We will start to code as soon as the proposal is accepted. We are going to finish coding within 2 weeks before code freeze which before end of July.