Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate media type of manifest and its descendants #10

Closed
wants to merge 1 commit into from
Closed

validate media type of manifest and its descendants #10

wants to merge 1 commit into from

Conversation

xiekeyang
Copy link
Contributor

@@ -149,6 +150,19 @@ func (v *validateCmd) validatePath(name string) error {

switch typ {
case image.TypeManifest:
fv, err := os.Open(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to open the file again. Can't we put all the manifest validation behind a single function that takes a ReadCloser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -240,3 +244,32 @@ loop:
}
return nil
}

// ValidateManifestPatten validate the manifest's internal
Copy link
Contributor

@wking wking Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patten? Is the idea here to separate blob-only validation from blob+descendant validation? In that case, I think the names should be something like ValidateManifestBlob and ValidateManifestDescendants with a ValidateManifest that calls both of those parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or have a single ValidateManifest(reader io.ReadCloser, descendants bool) error where ValidateManifest(reader, false) will just look at the blob and `ValidateManifest(reader, true) will look at the blob and descendants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor Author

@xiekeyang xiekeyang Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be done by a signal function, which I've fixed. But descendants bool seems unnecessary. I use bytes.NewReader() for the buffer, instead of re-opening file. The new reader handler can be released when it is finished, so io.ReadCloser may be not mandatory. it is my own understanding.

Layers []struct {
MediaType string `json:"mediaType"`
} `json:"layers"`
}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define your own type instead of using v1.Manifest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is my fault. Fixed.

@xiekeyang xiekeyang changed the title validate media type patten in manifest validate media type of manifest and its descendants Sep 19, 2016
// and its descendants fields media type, such as config and layers,
// to make sure they match to spec definition, or returns an error if
// the validation failed.
func ValidateManifestMediaType(r io.Reader) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to expose media-type validation as a separate thing. There's just “validation”, which includes checking media types against spec requirements as well as checking other parameter values against spec requirements. That probably means we want a comparison with the appropriate JSON Schema (to catch things like schemaVersion is unset) and additional checks for cases not covered by JSON Schema (e.g. schemaVersion != 2). But I don't think a caller will care about picking and choosing particular subsets of validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, validate command will call schema.MediaTypeManifestList.Validate() directly when validating signal manifest file. I think it is an efficient method to wrap descendants and schema checking in one function(ValidateManifestMediaType).

But I don't think a caller will care about picking and choosing particular subsets of validation.

The callers are image layout validation and manifest file validation. According to current calling subsequence, they are checking schema. I present the new function to add descendants checking, make it efficient, and keep current checked subsets un-lost.
Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the call to schema.MediaTypeManifest.Validate at the end of this function, the name should probably just be ValidateManifest (since it covers descriptor media types locally and the JSON Schema via schema.MediaTypeManifest.Validate.

@wking
Copy link
Contributor

wking commented Sep 19, 2016

On Mon, Sep 19, 2016 at 12:03:39AM -0700, xiekeyang wrote:

I use bytes.NewReader() for the buffer, instead of re-opening
file. The new reader handler can be released when it is finished, so
io.ReadCloser may be not mandatory .

It's easy to build a ReadCloser around a Reader using ioutil.NopCloser
where you don't care about closing the reader. But for cases where
someone just passes a newly-opened file through to the validator,
having the validator take a ReadCloser and close the file after it's
done parsing the JSON means you don't have to hold the manifest file
open while you walk its descendants. So I still think we want the
validator to take a ReadCloser.

@xiekeyang
Copy link
Contributor Author

It's easy to build a ReadCloser around a Reader using ioutil.NopCloser
where you don't care about closing the reader...

@wking I fixed it to use ReadCloser as argument, PTAL.

err = json.NewDecoder(bytes.NewReader(buf)).Decode(&header)
if err != nil {
return errors.Wrap(err, "manifest format mismatch")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want an error-checked call to r.Close() here. No need to keep the reader open after you've read it while you walk the descendants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, with 739f3ada574b2f you dropped the early defer r.Close(). I think you want to keep that so the reader is closed on early errors.

And there's no reason to read the file in and then create a bytes.NewReader around it. Just call json.NewDecoder(r)….

@wking
Copy link
Contributor

wking commented Sep 20, 2016

On Mon, Sep 19, 2016 at 07:14:14PM -0700, xiekeyang wrote:

+// ValidateManifestMediaType validate the manifest schema media-type
+// and its descendants fields media type, such as config and layers,
+// to make sure they match to spec definition, or returns an error if
+// the validation failed.
+func ValidateManifestMediaType(r io.Reader) error {

However, validate command will call
schema.MediaTypeManifestList.Validate() directly when validating
signal manifest file.

I think we want main → image → schema, and not have a main calling
both image and schema directly 1.

But I don't think a caller will care about picking and choosing
particular subsets of validation.

The callers are image layout validation and manifest file
validation.

We're also trying to provide an API that can be consumed directly by
third parties (#2).

According to current calling subsequence, they are checking
schema. I present the new function to add descendants checking, make
it efficient, and keep current checked subsets un-lost.

Am I missing something?

I think we want to lose the current validation subsets ;). Do you see
a gain in keeping them? If not, an API with a single validation
endpoint covering both the schema and descendants seems simpler than
an API that requires you to call both.

@xiekeyang
Copy link
Contributor Author

xiekeyang commented Sep 20, 2016

@wking
Do you mean refactor the old mechanism? It sounds like that.
I am trying this patch to resolve a exist bug, at the same time make sure more (at least same) efficient than old state, and do not harm to old mechanism.
I honestly think it is not MUST to refactor in this patch, but MUST refactor later, and your proposal can be put up as a new issue
Trust me I'm and will be working on each of your proposal, in this issue, and in previous issues.
And really thanks very much for your good idea.

@wking
Copy link
Contributor

wking commented Sep 20, 2016

On Tue, Sep 20, 2016 at 03:57:16AM -0700, xiekeyang wrote:

Do you mean refactor the old mechanism? It sounds like that.

Collecting manifest validation into one function doesn't seem like a
major refactoring to me, but yes, that's what I'm asking for.

I honestly think it is not MUST to refactor in this patch, but MUST
refactor later, and your proposal can be put up as a new issue

If you feel like collecting the manifest validation into one function
is too much for this PR, I'm ok with punting it to follow-up work.

// and its descendants fields media type, such as config and layers,
// to make sure they match to spec definition, or returns an error if
// the validation failed.
func ValidateManifestMediaType(r io.ReadCloser) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this take an io.ReadCloser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

}

for _, layer := range header.Layers {
if layer.MediaType != string(v1.MediaTypeImageLayer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably way to strict. I'm not sure these media types are "illegal".

}
}

return schema.MediaTypeManifest.Validate(bytes.NewReader(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above function should be covered by this Validate call. What is going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seems to mean schema.MediaTypeManifest.Validate() have checked descendants as config and layer media types inside manifest? Actually it doesn't. Currently manifest validation is passed to:

{
    "annotations": null,
    "config": {
        "digest": "sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749",
        "mediaType": "foo",
        "size": 1459
    },  
    "layers": [
        {   
            "digest": "sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f",
            "mediaType": "bar",
            "size": 667590
        }   
    ],  
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "schemaVersion": 2
}

schema.MediaTypeManifest.Validate() just check manifest own media type and schema patten, not involving descendants media type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this code is checking descendants. It decodes the manifest, then checks the mediatypes of the layers.

It looks like there is a bug in the validation. Go fix that rather than just adding more code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe
This has been discussed [1]. The validation depends on JSON scheme, involving JSON patten files.
@wking and I think that JSON schema can not present all constraints and bug fix had better not to tie to JSON Schema too tightly.
According to current schema file, which shows common patten as below, it is some hard to add this constraints.

{
    "mediaType": {
      "id": "https://opencontainers.org/schema/image/mediaType",
      "type": "string",
      "pattern": "^[a-z]+/[0-9a-zA-Z.+]+$"
    }  
}

JSON schema is third party to image tools, and we just want to fix bug for implementor tool. So we select to fix it here.

[1] opencontainers/image-spec#273 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If json schema doesn't meet our validation needs, we need to drop it. Adding a new function is not the right way to fix it. It is just adding extra code. The Type.Validate(...) function should do everything that is needed. There shouldn't be two functions to validate the same thing.

@wking
Copy link
Contributor

wking commented Sep 20, 2016

On Tue, Sep 20, 2016 at 12:12:46PM -0700, Stephen Day wrote:

+// ValidateManifestMediaType validate the manifest schema media-type
+// and its descendants fields media type, such as config and layers,
+// to make sure they match to spec definition, or returns an error if
+// the validation failed.
+func ValidateManifestMediaType(r io.ReadCloser) error {

Why does this take an io.ReadCloser?

To avoid holding an open file while we walk the descendants (after
we've read the manifest itself into memory) 1.

@wking
Copy link
Contributor

wking commented Sep 20, 2016

On Tue, Sep 20, 2016 at 12:12:46PM -0700, Stephen Day wrote:

  • for _, layer := range header.Layers {
  • if layer.MediaType != string(v1.MediaTypeImageLayer) {
    

This is probably way to strict. I'm not sure these media types are "illegal".

No media types are currently illegal, but this tooling will never
support all media types (e.g. an image/png layer). This is what
opencontainers/image-spec#304 is trying to clarify, and this PR is
more or less checking the semantics floated in
opencontainers/image-spec#304.

@xiekeyang
Copy link
Contributor Author

xiekeyang commented Sep 21, 2016

@wking @stevvooe

On Tue, Sep 20, 2016 at 12:12:46PM -0700, Stephen Day wrote:

We have discussed if illegal media type exists. My idea is as least we should not allow any string to be passed. But no common view is produced yet, so here I introduce a strict condition.

return errors.Wrapf(err, "error reading the io stream")
}

err = json.NewDecoder(bytes.NewReader(buf)).Decode(&header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use decoder when you already have a []byte'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}

return schema.MediaTypeManifest.Validate(bytes.NewReader(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this code is checking descendants. It decodes the manifest, then checks the mediatypes of the layers.

It looks like there is a bug in the validation. Go fix that rather than just adding more code.

@stevvooe
Copy link
Contributor

@wking Please respond inline. It is very confusing to have the response be in a random comment.

The resource should be managed by the caller. It is almost always wrong to use io.ReadCloser.

@xiekeyang
Copy link
Contributor Author

xiekeyang commented Sep 21, 2016

@wking @stevvooe

I've some view on io.Reader vs. io.ReadCloser:
ReadCloser avoids wasting reader resource, when r.Close() method executing.
However in our code, NewReader inside ValidateManifestMediaType is local reference variable, and will be released when reference finished. It has no risk to waste the Reader resource.

You want an error-checked call to r.Close() here. No need to keep the reader open after you've read it while you walk the descendants.

Actually, if we define the arg as ReadCloser type, the r.Close() is inactive because ioutil.NopCloser() has to be executed outside.

And there's no reason to read the file in and then create a bytes.NewReader around it. Just call json.NewDecoder(r)….

It seems to be impracticable. No matter Reader or ReadClose must be read to buffer, calling it means its pointer is seek. So continually using will meet EOF problem.

I might not figure out the best solution yet, but I has been doubting ReadCloser here. So let me back the patch to Reader.

migrate from opencontainers/image-spec/pull/273

Signed-off-by: xiekeyang <xiekeyang@huawei.com>
return errors.Wrapf(err, "error reading the io stream")
}

err = json.Unmarshal(buf, &header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still no reason to read r into a buffer and then unpack it to JSON. You've previously used:

buf, err := ioutil.ReadAll(r)
…
err = json.NewDecoder(bytes.NewReader(buf)).Decode(&header)

And we all agreed that the NewReader call wasn't a good idea. But instead of DecodeUnmarshal, I still think you just want to use:

func ValidateManifestMediaType(reader io.Reader) error {
  manifest := v1.Manifest{}
  err = json.NewDecoder(reader).Decode(&manifest)
  …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if that we will meet the problem [1] that reader(pointer) is seek to end, and the next using reader is failed of EOF. How to avoid this?
So far I not find a example to reuse Reader in context.

[1] #10 (comment)

@wking
Copy link
Contributor

wking commented Sep 21, 2016

On Wed, Sep 21, 2016 at 12:26:12AM -0700, xiekeyang wrote:

  • err = json.Unmarshal(buf, &header)

But if that we will meet the problem 1 that reader(pointer) is
seek to end, and the next using reader is failed of EOF. How to
avoid this? So far I not find a example to reuse Reader in
context.

Ah, I missed you needing a reader later on for
schema.MediaTypeManifest.Validate. I think we should be able to
validate using NewGoLoader 1, in which case we could skip the local
buffering. But I'm ok punting that to a follow-up PR.

@wking
Copy link
Contributor

wking commented Sep 22, 2016

On Wed, Sep 21, 2016 at 05:35:57PM -0700, Stephen Day wrote:

  • return schema.MediaTypeManifest.Validate(bytes.NewReader(buf))

If json schema doesn't meet our validation needs, we need to drop
it.

I don't think that follows. JSON Schema has the benefit that there
are JSON Schema validators written in many languages, so providing a
JSON Schema is an easy way to share validation information among
varied consumers. If that JSON Schema doesn't completely cover the
normative spec, I'd rather augment it (like this PR does) with any
additional checks that are needed to cover the gaps. Cutting
completely over to Go (or whatever) would mean folks using other
languages to validate OCI JSON would no longer have convenient,
machine-readable help from OCI projects.

The Type.Validate(...) function should do everything that is
needed.

And with this series, the ValidateManifestMediaType function does
everything that is currently coded (it may still be missing pieces,
I'm not sure, but it's definitely a superset of the JSON Schema
validation).

@xiekeyang
Copy link
Contributor Author

@wking @stevvooe
So far we have make some discussions as above. I think we are debating about the role and scope of JSON Schema in image-spec, which Type.Validate(...) is depended on.
We have recommended that If JSON Schema constrain this, it will look some strict to users.
This PR can validate the file correctly, and allows consumers to generate their defined manifest.
Or if we really think JSON Schema should constrain all cases, we should modify it in image-spec/schema/*.json.
And ValidateManifestMediaType seems to be OK on orthonormality side, according to current calling relationship.

@wking
Copy link
Contributor

wking commented Sep 22, 2016

On Thu, Sep 22, 2016 at 02:49:44AM -0700, xiekeyang wrote:

Or if we really think JSON Schema should constrain all cases…

I doubt this is possible. For example, a JSON-Schema-only validator
cannot tell:

  1. If the blob referenced by a manifest's ‘config’ descriptor conforms
    to the media type advertised by that descriptor.
  2. If the blob referenced by a manifest's ‘config’ is retrievable (not
    strictly required for compliance, but without it you cannot test
    (1)).

I'm fine having JSON-Schema validation being a portion of the full
validation 1.

@vbatts
Copy link
Member

vbatts commented Sep 22, 2016

Given that this PR only checks that the string of the media is an expected type, what else is needed here?

@wking
Copy link
Contributor

wking commented Sep 22, 2016

On Thu, Sep 22, 2016 at 10:15:43AM -0700, Vincent Batts wrote:

Given that this PR only checks that the string of the media is an
expected type, what else is needed here?

That's all it adds. It's also validating against the JSON Schema
1 like it used to 2.

So I think cfbc9eb looks pretty good, except for:

@stevvooe
Copy link
Contributor

Why isn't the validation handled in schema.MediaTypeManifest.Validate()? This PR adds yet another layer of validation.

Fix schema.MediaTypeManifest.Validate.

@xiekeyang
Copy link
Contributor Author

xiekeyang commented Sep 29, 2016

@stevvooe

Why isn't the validation handled in schema.MediaTypeManifest.Validate()? This PR adds yet another layer of validation.
Fix schema.MediaTypeManifest.Validate.

I'd submitted a PR, fit to what you said, to opencontainers/image-spec#341. In that PR I modify schema json files, to check manifest descendants types, which is used by schema.MediaTypeManifest.Validate.

I put comment that we will select the better PR in these two and close the other.

PTAL opencontainers/image-spec#341, Thanks.

@vbatts
Copy link
Member

vbatts commented Oct 6, 2016

What's the next step here?

@wking
Copy link
Contributor

wking commented Oct 6, 2016

On Thu, Oct 06, 2016 at 09:15:02AM -0700, Vincent Batts wrote:

What's the next step here?

Maintainers need to decide where they want this validation code to go.
In schema/ or outside? In image-spec or this repo? More discussion
on the latter question in opencontainers/image-spec#337.

@xiekeyang
Copy link
Contributor Author

this should be closed

@xiekeyang xiekeyang closed this Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants