-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
annotations: restrict character set of ref.name values #671
Conversation
An unsolicited sgtm!
…On Thu, May 11, 2017 at 3:54 PM v1.0.0.batts ***@***.***> wrote:
URI are standard enough. Though the most concise character set for the
URI is in the appendix A https://tools.ietf.org/html/rfc3986#appendix-A
unless I missed something.
Fixes #599 <#599>
Signed-off-by: Vincent Batts ***@***.***
------------------------------
You can view, comment on, or merge this pull request online at:
#671
Commit Summary
- annotations: restrict character set of ref.name values
File Changes
- *M* annotations.md
<https://github.com/opencontainers/image-spec/pull/671/files#diff-0>
(4)
Patch Links:
- https://github.com/opencontainers/image-spec/pull/671.patch
- https://github.com/opencontainers/image-spec/pull/671.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#671>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AABJ6wbku5yO98Fdge1NaztUr9BRXpEvks5r42eLgaJpZM4NYedj>
.
|
annotations.md
Outdated
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md). | ||
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). | ||
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md). | ||
* Character set of the value SHOULD conform to the [Uniform Resource Identifier (URI) RFC3986](https://tools.ietf.org/html/rfc3986). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this stands, it sounds like you're punting to the URI
rule, but that requires a scheme
, colon, and hier-part
. I doubt we want to recommend all names contain a colon. Can we punt to hier-part
instead? Or maybe segment-nz
? @stevvooe mentioned “schema-less URIs”, perhaps he was thinking of path-noscheme
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite clear that it only restricts it to the character set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite clear that it only restricts it to the character set.
Which character set is it restricting it to? pchar? unreserved? Something else? There are lots of possible choices in the linked collected ABNF.
I wouldn't like that either; but it seems like this just covers the
character set?
…On Thu, May 11, 2017 at 4:09 PM W. Trevor King ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In annotations.md
<#671 (comment)>
:
> @@ -26,4 +26,6 @@ This specification defines the following annotation keys, intended for but not l
* **org.opencontainers.image.revision** Source control revision identifier for packaged software.
* **org.opencontainers.image.vendor** Name of the distributing entity, organization or individual.
* **org.opencontainers.image.licenses** Comma-separated list of licenses under which contained software is distributed, in [SPDX Short identifier]https://spdx.org/licenses/] form.
-* **org.opencontainers.image.ref.name** Name of the reference for a target (string). SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
+* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
+ * SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
+ * Character set of the value SHOULD conform to the [Uniform Resource Identifier (URI) RFC3986](https://tools.ietf.org/html/rfc3986).
As this stands, it sounds like you're punting to the URI rule, but that
requires a scheme, colon, and hier-part. I doubt we want to recommend all
names contain a colon. Can we punt to hier-part instead? Or maybe
segment-nz? @stevvooe <https://github.com/stevvooe> mentioned “schema-less
URIs
<#591 (comment)>”,
perhaps he was thinking of path-noscheme?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#671 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJ61O4LNSGQj1o5nb4DfXrI43ocVqHks5r42rsgaJpZM4NYedj>
.
|
On Thu, May 11, 2017 at 01:12:11PM -0700, Erik Hollensbe wrote:
I wouldn't like that either; but it seems like this just covers the
character set?
There are a number of charset rules in that ABNF (pchar, unreserved,
reserved, gen-delims, and sub-delims are all single-char choices).
Which one is this PR recommending? The other rules look like they
have more structure, but the name values might be longer than a single
character, so a rule like segment-nz could cover a fair chunk of
useful names out of the box. No slashes in segment-nz though, so if
you want those you have to go for something more complicated or
compose your own rule like:
name = 1*nchar
nchar = pchar / reserved / gen-delims
|
LGTM @vbatts Thank you for getting this done! |
I guess |
Would it be better to just go with |
Also we need to look into what current Docker registry implementations allowing |
@AkihiroSuda true true. cause this will map to tags |
On Fri, May 12, 2017 at 06:42:47AM -0700, v1.0.0.batts wrote:
https://github.com/docker/distribution/blob/master/reference/regexp.go#L37
Docker's current \w[\w.-]{0,127} is not enough space for some common
encodings. Base 32 and 64 use = for padding [1], percent encoding
needs access to % [2]. Commonly-used encodings that will fit that
regexp are base 16 [3] and punycode [4]. You can use the ACE prefix
[5] as a way to label punycode, so I'd lean on RFC 3490 here and say
that names SHOULD be A-labels, U-labels, or NR-LDH labels. There's
more discussion of RFC 3490 in #163 and #226.
[1]: https://tools.ietf.org/html/rfc4648
[2]: https://tools.ietf.org/html/rfc3986#section-2.1
[3]: https://tools.ietf.org/html/rfc4648#section-8
[4]: https://tools.ietf.org/html/rfc3492
“… non-ASCII characters are represented by ASCII characters that
are allowed in host name labels (letters, digits, and hyphens)."
[5]: https://tools.ietf.org/html/rfc3490#section-5
|
@AkihiroSuda this ABNF is terrible to read, but i think i agree on |
updated. PTAL. |
annotations.md
Outdated
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md). | ||
* **org.opencontainers.image.ref.name** Name of the reference for a target (string). | ||
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md). | ||
* Character set of the value SHOULD conform to the [Uniform Resource Identifier (URI) RFC3986](https://tools.ietf.org/html/rfc3986) [`segment-nz` (from appendix A)](https://tools.ietf.org/html/rfc3986#appendix-A). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical link for segment-nz
is section 3.3, appendix A just collects the ABNF together for ease-of-use. If you want to keep the appendix reference, maybe “from appendix A” → “see appendix A”?
Also, segment-nz
is a bigger charset than Docker's tags.
If we allow |
So +1 for copypasting from the Docker registry implementation. I'm not sure we should add % here. |
@AkihiroSuda good points. So we outline our own character set, or just leave this as |
If we're going to restrict it, let's make it a URI fragment (the bit after |
Going from the ABNF, we have the following:
In regexp, it is something like this:
I would prefer something like this:
The above is nearly identical to docker's tag, but allows The alternative, with slashes, would be something like this:
This allows docker references, docker tags, hostnames, punycode hosts, as well as a number of other things. The above grammar is regular, so it can just be a regexp. It does prevent some of the nastier parts of urls, like percent encoding. |
I'm 👍 on anything that is a subset of |
@stevvooe cool that's pretty close. Is your example ABNF, regexp, or sttevveeoeooeoe expression? 😛 |
@stevvooe I don't get why for the separator to have |
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers#671 (comment) Fixes opencontainers#599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
updated. PTAL |
based on https://godoc.org/golang.org/x/exp/ebnf and https://github.com/github/linguist/blob/master/lib/linguist/languages.yml#L1030 going with EBNF would look like this: letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
| "c" | "d" | "e" | "f" | "g" | "h" | "i"
| "j" | "k" | "l" | "m" | "n" | "o" | "p"
| "q" | "r" | "s" | "t" | "u" | "v" | "w"
| "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
alphanum = letter | digit, { letter | digit } ;
separator = "-" | "." | "_" | ":" | "@" | "+" | "--" ;
component = alphanum, { separator | alphanum, alphanum } ;
ref = component, { "/", component } ; This way we could have validation on the grammar, and pretty highlighting for github-displayed code blocks. There are some ways that ABNF looks nicer, as it has default variables like |
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers#671 (comment) And rather than making up our own mock grammar, lean on more formal EBNF. Fixes opencontainers#599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
rebased, and went with EBNF. Please review. |
annotations.md
Outdated
| "c" | "d" | "e" | "f" | "g" | "h" | "i" | ||
| "j" | "k" | "l" | "m" | "n" | "o" | "p" | ||
| "q" | "r" | "s" | "t" | "u" | "v" | "w" | ||
| "x" | "y" | "z" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go docs have:
Production = name "=" [ Expression ] "." .
which is using .
(and not ;
) as the rule terminator.
annotations.md
Outdated
| "q" | "r" | "s" | "t" | "u" | "v" | "w" | ||
| "x" | "y" | "z" ; | ||
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; | ||
alphanum = letter | digit, { letter | digit } ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go docs aren't particularly clear on this, but you seem to be using ,
for concatenation and they don't seem to use any character for concatenation. I think it would be better to use:
alphanum = (letter | digit) {letter | digit} .
annotations.md
Outdated
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; | ||
alphanum = letter | digit, { letter | digit } ; | ||
separator = "-" | "." | "_" | ":" | "@" | "+" | "--" ; | ||
component = alphanum, { separator | alphanum, alphanum } ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will allow component to contain consecutive separators. I think you want:
component = alphanum {separator alphanum} .
[edit: updated to drop word
; I'd forgotten alphanum
was already multicharacter]
fair, the golang EBNF is different than what things like wikipedia show...
|
On Thu, Jun 01, 2017 at 11:14:25AM -0700, v1.0.0.batts wrote:
ref = component { ( "/" component ) } .
You don't need the parens here, because the braces are already
grouping their content.
component = alphanum { ( separator | alphanum ) alphanum } .
This is overly complicated, because the alphanum case is:
component = alphanum { alphanum alphanum } .
but alphanum already allows multiple characters. So all you need is
the separator case [1]:
component = alphanum { separator alphanum } .
[1]: #671 (comment)
|
it's due to not ending the name with a seperator, but ending with an alphanum |
@vbatts I'd rather we merge this with the existing grammar and then make the change (or not) across the entire specification in a single pass. There are many other places in the specification that would need to be updated. Looks like some of this syntax is picked up from the PEG paper. You can read up on PEGs at this website. It is also based on the myriad combinator-based parsing libraries. PEGs are generally better for this as they don't allow ambiguity. |
On Thu, Jun 01, 2017 at 12:54:58PM -0700, v1.0.0.batts wrote:
it's due to not ending the name with a seperator, but ending with an
alphanum
This:
component = alphanum { separator alphanum } .
has to end in an alphanum (which I think is what you're aiming at
[1]). You're allowed to append any number of ‘separator alphanum’
bits to the initial alphanum.
[1]: #671 (comment)
|
The PEG papers seem more focused on the mathematical properties of PEGs and less focused on the specific instances (e.g. which character(s) to use for definitions). But based on this, the PEG formulation of the current rule is something like:
which looks a lot like the ABNF formulation. The PEG form is a bit less ambiguous, because you don't have the “is this next Another issue with PEG (although no different from ABNF in this regard) is that it uses |
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers#671 (comment) Fixes opencontainers#599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
git reflog to the rescue. |
(hence, this whole thing of picking a particular grammar is a bit silly, as i've found that unless you define your own dialect in full, like XML have, then you can't just rely on saying EBNF, etc) |
On Fri, Jun 02, 2017 at 11:51:51AM -0700, v1.0.0.batts wrote:
(hence, this whole thing of picking a particular grammar is a bit
silly, as i've found that unless you define your own dialect in
full, like XML have, then you can't just rely on saying EBNF, etc)
There's no problem with referencing someone else who has defined a
whole dialect. For example:
* Parsing Expression Grammars: A Recognition-Based Syntactic
Foundation. (PDF). Bryan Ford, POPL, January 2004.
http://bford.info/pub/lang/peg
http://bford.info/pub/lang/peg.pdf
* RFC 5234 https://tools.ietf.org/html/rfc5234
* XML 1.0's EBNF https://www.w3.org/TR/REC-xml/#sec-notation
* ISO/IEC 14977:1996
https://www.iso.org/standard/26153.html
http://standards.iso.org/ittf/PubliclyAvailableStandards/s026153_ISO_IEC_14977_1996%28E%29.zip
* Go's EBNF https://godoc.org/golang.org/x/exp/ebnf
All of those are explicit about what characters to use for the various
grammer components, although some are more detailed and/or easier
reads than others. There are also parser-generators to get you from
grammars defined in the various *BNF flavors to machine code. For
example peg(1) [1] will compile my PEG formulation [2] (with a typo
fix) to C:
$ cat ref.peg
start <- component ("/" component)*
component <- alphanum (separator alphanum)*
alphanum <- (letter / digit)+
separator <- "--" / "-" / "." / "_" / ":" / "@" / "+"
letter <- [A-Za-z]
digit <- [0-9]
$ peg ref.peg >ref.c
$ cat <<EOF >>ref.c
int main(void) {
if (yyparse()) {
printf("found a valid ref\n");
if (fgetc(stdin) != EOF) {
printf("some characters were not consumed\n");
return 1;
}
return 0;
}
printf("did not find a valid ref\n");
return 1;
}
EOF
$ cc -o ref ref.c
$ echo hello | ./ref
found a valid ref
$ echo hello/world | ./ref
found a valid ref
$ echo hello/world--foo | ./ref
found a valid ref
$ echo hello/world--foo--bar | ./ref
found a valid ref
$ echo hello/world--foo~bar | ./ref
found a valid ref
some characters were not consumed
$ echo ~hello | ./ref
did not find a valid ref
[1]: http://piumarta.com/software/peg/
[2]: #671 (comment)
|
I wish I could have put this as elegantly as you. This is basically why we just went with the syntax that is in place. |
LGTM conditional on something like #692 going in; we can definitely do better than "EBNF-esque" |
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers/image-spec#671 (comment) Fixes #599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers/image-spec#671 (comment) Fixes #599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers/image-spec#671 (comment) Fixes #599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers/image-spec#671 (comment) Fixes #599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
URI are standard enough, though % opens it a bit too much. After much discussion we're going to keep it as close to the docker tag as possible but with +, @ and / reference: opencontainers/image-spec#671 (comment) Fixes #599 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Started from the URI charset, and ended up with a broader set and pattern from the docker tag. While not strictly the same, it's closer to the ideal set for containerd, flatpak and others.
Fixes #599
Signed-off-by: Vincent Batts vbatts@hashbangbash.com