Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotations: restrict character set of ref.name values #671

Merged
merged 1 commit into from
Jun 7, 2017

Conversation

vbatts
Copy link
Member

@vbatts vbatts commented May 11, 2017

Started from the URI charset, and ended up with a broader set and pattern from the docker tag. While not strictly the same, it's closer to the ideal set for containerd, flatpak and others.

Fixes #599

Signed-off-by: Vincent Batts vbatts@hashbangbash.com

@erikh
Copy link
Contributor

erikh commented May 11, 2017 via email

annotations.md Outdated
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
* Character set of the value SHOULD conform to the [Uniform Resource Identifier (URI) RFC3986](https://tools.ietf.org/html/rfc3986).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this stands, it sounds like you're punting to the URI rule, but that requires a scheme, colon, and hier-part. I doubt we want to recommend all names contain a colon. Can we punt to hier-part instead? Or maybe segment-nz? @stevvooe mentioned “schema-less URIs”, perhaps he was thinking of path-noscheme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite clear that it only restricts it to the character set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite clear that it only restricts it to the character set.

Which character set is it restricting it to? pchar? unreserved? Something else? There are lots of possible choices in the linked collected ABNF.

@erikh
Copy link
Contributor

erikh commented May 11, 2017 via email

@wking
Copy link
Contributor

wking commented May 11, 2017 via email

@stevvooe
Copy link
Contributor

stevvooe commented May 11, 2017

LGTM

@vbatts Thank you for getting this done!

Approved with PullApprove

@AkihiroSuda
Copy link
Member

  segment       = *pchar
   segment-nz    = 1*pchar
   segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                 ; non-zero-length segment without any colon ":"

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

   query         = *( pchar / "/" / "?" )

   fragment      = *( pchar / "/" / "?" )

   pct-encoded   = "%" HEXDIG HEXDIG

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

I guess segment-nz-nc? (excludes colon)

@stevvooe
Copy link
Contributor

Would it be better to just go with \w+?

@AkihiroSuda
Copy link
Member

[\w-\.%]+ as far as I can come up with.

Also we need to look into what current Docker registry implementations allowing

@vbatts
Copy link
Member Author

vbatts commented May 12, 2017

@AkihiroSuda true true. cause this will map to tags

@vbatts
Copy link
Member Author

vbatts commented May 12, 2017

@wking
Copy link
Contributor

wking commented May 12, 2017 via email

@vbatts
Copy link
Member Author

vbatts commented May 12, 2017

@AkihiroSuda this ABNF is terrible to read, but i think i agree on segment-nz-nc or even segment-nz, but then docker/moby would stop at the first occurence of an :

@vbatts
Copy link
Member Author

vbatts commented May 12, 2017

updated. PTAL.

@stevvooe
Copy link
Contributor

stevvooe commented May 12, 2017

LGTM

Approved with PullApprove

annotations.md Outdated
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
* Character set of the value SHOULD conform to the [Uniform Resource Identifier (URI) RFC3986](https://tools.ietf.org/html/rfc3986) [`segment-nz` (from appendix A)](https://tools.ietf.org/html/rfc3986#appendix-A).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonical link for segment-nz is section 3.3, appendix A just collects the ABNF together for ease-of-use. If you want to keep the appendix reference, maybe “from appendix A” → “see appendix A”?

Also, segment-nz is a bigger charset than Docker's tags.

@AkihiroSuda
Copy link
Member

If we allow @ here, implementations may cause security issues when parsing impl-specific ref string example.com/foo/bar:baz@sha256:deadbeefdeadbeef

@AkihiroSuda
Copy link
Member

So +1 for copypasting from the Docker registry implementation.

I'm not sure we should add % here.
One may put both latest and l%d0%b0test ( UI shows this as lаtest - Russian homoglyph) to single manifest, and may distribute malware with the latter ref name

@vbatts
Copy link
Member Author

vbatts commented May 17, 2017

@AkihiroSuda good points.

So we outline our own character set, or just leave this as (string)?

@cyphar
Copy link
Member

cyphar commented May 17, 2017

If we're going to restrict it, let's make it a URI fragment (the bit after #) which is defined in the URI RFC. The reason for this is that it will allow for URIs like scheme://somedistributionthing#therefname as a referencing style.

@stevvooe
Copy link
Contributor

Going from the ABNF, we have the following:

fragment      = *( pchar / "/" / "?" )
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

In regexp, it is something like this:

fragment := /[A-Za-z0-9-._:@!$&"()*+,;=/?]+/

I would prefer something like this:

ref := /[A-Za-z0-9-._+]+/

The above is nearly identical to docker's tag, but allows +.

The alternative, with slashes, would be something like this:

ref := component ["/" component]*
component := alphanum [separator alphanum]*
alphanum := /[A-Za-z0-9]+/
separator := /[-._:@]/ | "--"

This allows docker references, docker tags, hostnames, punycode hosts, as well as a number of other things. The above grammar is regular, so it can just be a regexp. It does prevent some of the nastier parts of urls, like percent encoding.

@cyphar
Copy link
Member

cyphar commented May 17, 2017

I'm 👍 on anything that is a subset of fragment (so that tools can represent tags as fragments which feeds into my plans for distribution with https://github.com/cyphar/parcel). I completely understand not wanting percent encoding 😉.

@vbatts
Copy link
Member Author

vbatts commented May 18, 2017

@stevvooe cool that's pretty close. Is your example ABNF, regexp, or sttevveeoeooeoe expression? 😛

@vbatts
Copy link
Member Author

vbatts commented May 18, 2017

@stevvooe I don't get why for the separator to have |"--"

vbatts added a commit to vbatts/oci-image-spec that referenced this pull request May 18, 2017
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers#671 (comment)

Fixes opencontainers#599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
@vbatts
Copy link
Member Author

vbatts commented May 18, 2017

updated. PTAL

@vbatts
Copy link
Member Author

vbatts commented Jun 1, 2017

based on https://godoc.org/golang.org/x/exp/ebnf and https://github.com/github/linguist/blob/master/lib/linguist/languages.yml#L1030 going with EBNF would look like this:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
   | "H" | "I" | "J" | "K" | "L" | "M" | "N"
   | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
   | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
   | "c" | "d" | "e" | "f" | "g" | "h" | "i"
   | "j" | "k" | "l" | "m" | "n" | "o" | "p"
   | "q" | "r" | "s" | "t" | "u" | "v" | "w"
   | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
alphanum = letter | digit, { letter | digit } ;
separator = "-" | "." | "_" | ":" | "@" | "+" | "--" ;
component = alphanum, { separator | alphanum, alphanum } ;
ref = component, { "/", component } ;

This way we could have validation on the grammar, and pretty highlighting for github-displayed code blocks.

There are some ways that ABNF looks nicer, as it has default variables like DIGIT and ALPHA. Also it has linguist highlighting. Honestly, I'm just not down with the ABNF *( ... ) and foo / bar notation. :-\

vbatts added a commit to vbatts/oci-image-spec that referenced this pull request Jun 1, 2017
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers#671 (comment)

And rather than making up our own mock grammar, lean on more formal
EBNF.

Fixes opencontainers#599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
@vbatts
Copy link
Member Author

vbatts commented Jun 1, 2017

rebased, and went with EBNF. Please review.

annotations.md Outdated
| "c" | "d" | "e" | "f" | "g" | "h" | "i"
| "j" | "k" | "l" | "m" | "n" | "o" | "p"
| "q" | "r" | "s" | "t" | "u" | "v" | "w"
| "x" | "y" | "z" ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go docs have:

Production  = name "=" [ Expression ] "." .

which is using . (and not ;) as the rule terminator.

annotations.md Outdated
| "q" | "r" | "s" | "t" | "u" | "v" | "w"
| "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
alphanum = letter | digit, { letter | digit } ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go docs aren't particularly clear on this, but you seem to be using , for concatenation and they don't seem to use any character for concatenation. I think it would be better to use:

alphanum = (letter | digit) {letter | digit} .

annotations.md Outdated
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
alphanum = letter | digit, { letter | digit } ;
separator = "-" | "." | "_" | ":" | "@" | "+" | "--" ;
component = alphanum, { separator | alphanum, alphanum } ;
Copy link
Contributor

@wking wking Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will allow component to contain consecutive separators. I think you want:

component = alphanum {separator alphanum} .

[edit: updated to drop word; I'd forgotten alphanum was already multicharacter]

@vbatts
Copy link
Member Author

vbatts commented Jun 1, 2017

fair, the golang EBNF is different than what things like wikipedia show...
To get something that validates with ebnflint -start ref ... would be:

<pre class="ebnf">
ref = component { ( "/" component ) } .
component = alphanum { ( separator | alphanum ) alphanum } .
alphanum = ( letter | digit ) { letter | digit } .
separator = "-" | "." | "_" | ":" | "@" | "+" | "--" .
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
   | "H" | "I" | "J" | "K" | "L" | "M" | "N"
   | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
   | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
   | "c" | "d" | "e" | "f" | "g" | "h" | "i"
   | "j" | "k" | "l" | "m" | "n" | "o" | "p"
   | "q" | "r" | "s" | "t" | "u" | "v" | "w"
   | "x" | "y" | "z" .
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .
</pre>

@wking
Copy link
Contributor

wking commented Jun 1, 2017 via email

@vbatts
Copy link
Member Author

vbatts commented Jun 1, 2017

it's due to not ending the name with a seperator, but ending with an alphanum

@stevvooe
Copy link
Contributor

stevvooe commented Jun 1, 2017

@vbatts I'd rather we merge this with the existing grammar and then make the change (or not) across the entire specification in a single pass. There are many other places in the specification that would need to be updated.

Looks like some of this syntax is picked up from the PEG paper. You can read up on PEGs at this website. It is also based on the myriad combinator-based parsing libraries. PEGs are generally better for this as they don't allow ambiguity.

@wking
Copy link
Contributor

wking commented Jun 1, 2017 via email

@wking
Copy link
Contributor

wking commented Jun 2, 2017

Looks like some of this syntax is picked up from the PEG paper.

The PEG papers seem more focused on the mathematical properties of PEGs and less focused on the specific instances (e.g. which character(s) to use for definitions). But based on this, the PEG formulation of the current rule is something like:

ref <- component ("/" component)*
component <- alphanum (separator / alphanum)*
alphanum <- (letter / digit)+
separator <- "--" / "-" / "." / "_" / ":" / "@" / "+"
letter <- [A-Za-z]
digit <- [0-9]

which looks a lot like the ABNF formulation. The PEG form is a bit less ambiguous, because you don't have the “is this next - part of the -- or - separator?” issue. But that's not a problem for us because we never allow consecutive separators and no other term could start or end with a hyphen.

Another issue with PEG (although no different from ABNF in this regard) is that it uses / for choices which @vbatts does not like. Of the currently-proposed syntaxes, only Go's and XML's use |.

URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers#671 (comment)

Fixes opencontainers#599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
@vbatts
Copy link
Member Author

vbatts commented Jun 2, 2017

git reflog to the rescue.
rolled back to the EBNF'esque + regex grammar.
Please review.

@vbatts
Copy link
Member Author

vbatts commented Jun 2, 2017

(hence, this whole thing of picking a particular grammar is a bit silly, as i've found that unless you define your own dialect in full, like XML have, then you can't just rely on saying EBNF, etc)

@wking
Copy link
Contributor

wking commented Jun 2, 2017 via email

@stevvooe
Copy link
Contributor

stevvooe commented Jun 2, 2017

(hence, this whole thing of picking a particular grammar is a bit silly, as i've found that unless you define your own dialect in full, like XML have, then you can't just rely on saying EBNF, etc)

I wish I could have put this as elegantly as you. This is basically why we just went with the syntax that is in place.

@stevvooe
Copy link
Contributor

stevvooe commented Jun 2, 2017

LGTM

Approved with PullApprove

@jonboulle
Copy link
Contributor

jonboulle commented Jun 4, 2017

LGTM conditional on something like #692 going in; we can definitely do better than "EBNF-esque"

Approved with PullApprove

@stevvooe stevvooe merged commit df6f3c5 into opencontainers:master Jun 7, 2017
@vbatts vbatts deleted the uri_charset branch June 7, 2017 15:14
@vbatts vbatts mentioned this pull request Jul 5, 2017
dattgoswami9lk5g added a commit to dattgoswami9lk5g/bremlinr that referenced this pull request Jun 6, 2022
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers/image-spec#671 (comment)

Fixes #599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
7c00d pushed a commit to 7c00d/J1nHyeockKim that referenced this pull request Jun 6, 2022
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers/image-spec#671 (comment)

Fixes #599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
7c00d added a commit to 7c00d/J1nHyeockKim that referenced this pull request Jun 6, 2022
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers/image-spec#671 (comment)

Fixes #599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
laventuraw added a commit to laventuraw/Kihara-tony0 that referenced this pull request Jun 6, 2022
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers/image-spec#671 (comment)

Fixes #599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
tomalopbsr0tt added a commit to tomalopbsr0tt/fabiojosej that referenced this pull request Oct 6, 2022
URI are standard enough, though % opens it a bit too much.
After much discussion we're going to keep it as close to the docker tag
as possible but with +, @ and /
reference: opencontainers/image-spec#671 (comment)

Fixes #599

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spec: define charset for "org.opencontainers.ref.name"
8 participants