Description
Is your feature request related to a problem? Please describe.
I think we should reconsider using text
instead of reserved-body
so that we can ensure the opacity of private-use
and reserved
. This would also require modifying quoted literals to escape {
and }
. This comes from a discussion in #444 about the data-model, but also previous discussion.
Here are my comments from 444:
I think this describes a bug in the syntax, where we were "clever" to smuggle in the quoted production and the whitespace production. I think the argument was something like: "if we unreserved a sigil, then it would just parse correctly". But I think we're better off if reserved and private are opaque to the ABNF. An unreserved sigil is subject to whatever ABNF is applied to it (either the existing annotation syntax or new syntax). Pre-unreserved implementations still won't parse into the sigil's space, so won't be affected.
In other words, I think we should modify the ABNF thusly:
private-use = private-start text
private-start = "^" / "&"
; reserve additional sigils for use by
; future versions of this specification
reserved = reserved-start text
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
The only escapes in text are around {
and }
(and \
in case one needs \}
as a character sequence). Brackets retain syntactic meaning everywhere. The current reserved-body
approach means that implementations parse reserved
and private-use
into word tokens and literals, even though the implementation is not allowed to interpret them. I think true opacity is the right approach. (In that case (1), (2), (3), and (4) are different character sequences).
(and, yeah, we've had this discussion before in #374)
Your argument there was:
Currently this is a valid expression:
{:foo key=|{bar}|}
If we were to unreserve @ and try to give it the same semantics as with :, this would be a parse error due to the unescaped >inner {:
{@foo key=|{bar}|}
The problem then is: quoted
allows {
and }
unescaped (because they are "quoted"). Adding these characters to the escaped list for quoted
simplifies the ABNF a tiny amount (quoted-escape and reserve-escape are the same and we lose some productions) and lets implementations parse out expressions by matching (unescaped) {
/}
pairs. The cost is that {
/}
must be escaped in a literal.
While fewer escapes is better than more escapes in quoted literals, I don't think that is very onerous compared to having private-use
(reserved
scares me less, as we might never use it) be "semi-parsed". I was convinced before that the ABNF would be okay using what we have now because the character sequences could be squeezed in (that I wouldn't actually have to parse the contents). But the data model shows why I was shy in the first place: private-use
and reserved
turn out to have parsed structure where we want opacity.