Reconsider using `text` for `private-use` and `reserved`

**Is your feature request related to a problem? Please describe.**

I think we should reconsider using `text` instead of `reserved-body` so that we can ensure the opacity of `private-use` and `reserved`. This would also require modifying quoted literals to escape `{` and `}`. This comes from a discussion in #444 about the data-model, but also previous discussion.

Here are my comments from 444:

I think this describes a bug in the syntax, where we were "clever" to smuggle in the quoted production and the whitespace production. I think the argument was something like: "if we unreserved a sigil, then it would just parse correctly". But I think we're better off if reserved and private are opaque to the ABNF. An unreserved sigil is subject to whatever ABNF is applied to it (either the existing annotation syntax or new syntax). Pre-unreserved implementations still won't parse into the sigil's space, so won't be affected.

In other words, I think we should modify the ABNF thusly:

```abnf
private-use    = private-start text
private-start  = "^" / "&"

; reserve additional sigils for use by 
; future versions of this specification
reserved       = reserved-start text
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
```

The only escapes in text are around `{` and `}` (and `\` in case one needs `\}` as a character sequence). Brackets retain syntactic meaning everywhere. The current `reserved-body` approach means that implementations parse `reserved` and `private-use` into word tokens and literals, even though the implementation is not allowed to interpret them. I think true opacity is the right approach. (In that case (1), (2), (3), and (4) are different character sequences).

(and, yeah, we've had this discussion before [in #374](https://github.com/unicode-org/message-format-wg/pull/374#discussion_r1167447318))

Your argument there was:

> Currently this is a valid expression:
>```
>{:foo key=|{bar}|}
>```
>If we were to unreserve @ and try to give it the same semantics as with :, this would be a parse error due to the unescaped >inner {:
>```
>{@foo key=|{bar}|}
>```

The problem then is: `quoted` allows `{` and `}` unescaped (because they are "quoted"). Adding these characters to the escaped list for `quoted` simplifies the ABNF a tiny amount (quoted-escape and reserve-escape are the same and we lose some productions) and lets implementations parse out expressions by matching (unescaped) `{`/`}` pairs. The cost is that `{`/`}` must be escaped in a literal. 

While fewer escapes is better than more escapes in quoted literals, I don't think that is very onerous compared to having `private-use` (`reserved` scares me less, as we might never use it) be "semi-parsed". I was convinced before that the ABNF would be okay using what we have now because the character sequences could be squeezed in (that I wouldn't actually have to parse the contents). But the data model shows why I was shy in the first place: `private-use` and `reserved` turn out to have parsed structure where we want opacity.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reconsider using `text` for `private-use` and `reserved` #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Reconsider using text for private-use and reserved #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Reconsider using `text` for `private-use` and `reserved` #446