[FEEDBACK] Rationalize name-char

This was "[FEEDBACK] Message Format Unquoted Literals #724", but was changed to focus on name-char. See later comments.
----
OLD
## Summary

Consider relaxing constraints on literals, after v45

## Background

Right now, unquoted literals are fairly narrowly constrained by
[<u>message.abnf</u>](https://github.com/unicode-org/message-format-wg/blob/main/spec/message.abnf)
; here are the relevant lines:

```
unquoted = name / number-literal

; number-literal matches JSON number
(https://www.rfc-editor.org/rfc/rfc8259#section-6)

number-literal = \["-"\] (%x30 / (%x31-39 \*DIGIT)) \["." 1\*DIGIT\]
\[%i"e" \["-" / "+"\] 1\*DIGIT\]

; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName

name = name-start \*name-char

name-start = ALPHA / "\_"

/ %xC0-D6 / %xD8-F6 / %xF8-2FF

/ %x370-37D / %x37F-1FFF / %x200C-200D

/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF

/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF

name-char = name-start / DIGIT / "-" / "."

/ %xB7 / %x300-36F / %x203F-2040
```

### Reason for reconsidering

However, for functions outside of the standard registry, this forces
many natural literals to use quotes. Here is an example from a function
that would handle MF1’s choice format:

```
|[0,1)| {{{\$count} is zero or fraction}}
```
The natural literals to use would be intervals, which use \[,(,),\]
characters for ranges (the choice format would require some recasting
because it depends on ordering of variants. It currently uses \>.) So
that would require

```
|[0,1)| {{{\$count} is zero or fraction}}
```
Many Unicode symbols are *included* by XML’s **NT-NCName** (about 6,000
currently), while many are excluded (about 2,600 currently). But these
are **literals**, **not identifiers**, which is what **name** is
intended for. By expanding beyond identifier usage, it allows functions
to avoid requiring quoting in many cases. It also allows us to dispense
with the special formulation for number-literal.

The literals for number, date, etc could be specified elsewhere, but
wouldn’t have to be in the ABNF.

That would allow for various registries to have more sophisticated
literal without requiring quoting, and without privileging the
structured literals that we know about now.

### Requirements

So, what restrictions on characters for a broadened definition of
**unquoted** literals would be required by a revised ABNF?

1.  No ‘}’, because it would make .local \$x = {literal} fail.

2.  No ‘\|’, because an initial one would conflict with quoting
    1.  While it would be possible to just disallow initial ones, but for clarity best to always forbid.

3.  No ‘:' or '$', because an initial one would indicate a function or variable, which would conflict in expressions starting with one (and initial '$' would conflict in the value of an option).
    * While it would be possible to just disallow initial ones, but for clarity best to always forbid.

4.  No ‘{’. Not strictly required, but for clarity best to always forbid.

5.  None of the big blocks of ‘strange’ code points that XML forbids: controls, (unpaired) surrogates, private-use, noncharacters.
    * These are all immutable ([<u>Unicode Character Encoding Stability</u>](https://www.unicode.org/policies/stability_policy.html#Property_Value)).

    * This also disallows the noncharacters that XML didn’t know about yet, before the noncharacter property was made immutable.

6.  No whitespace, since **variant** uses that for separators between  keys, and expressions use it to separate various components.

    * This could be done by just disallowing the “**s**” production characters, but that could be very confusing. {a b} looks too  much like two items (the space is an A0 NO-BREAK SPACE). So it should be broadened to the Unicode Whitespace characters.

    * Unicode Whitespace is not guaranteed immutable, but has not changed for over a decade. Anyway, we would derive the code points as of now, so everything would be stable into the future.

## Detailed Proposal

This would result in the following change:

### OLD

```
unquoted-literal = name / number-literal
; number-literal matches JSON number (https://www.rfc-editor.org/rfc/rfc8259#section-6)
number-literal   = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]

// The characters include the following (though name-char and
number-literal additions are positional):

// name-start is \[\\: A-Z \_ a-z \x{C0}-\x{D6} \x{D8}-\x{F6}
\x{F8}-\x{2FF} \x{370}-\x{37D} \x{37F}-\x{1FFF} \x{200C}-\x{200D}
\x{2070}-\x{218F} \x{2C00}-\x{2FEF} \x{3001}-\x{D7FF} \x{F900}-\x{FDCF}
\x{FDF0}-\x{FFFD} \x{10000}-\x{EFFFF}\]

// name-char adds \[\\- . 0-9 \x{B7} \x{0300}-\x{036F}
\x{203F}-\x{2040}\]

// number-literal adds \[+ e\]
```

### NEW
This changes just the first line above, `unquoted-literal = name / number-literal` — the rest of the above would remain the same.

```
unquoted-literal = literal-char+

// Then down in ; Restrictions on characters in various contexts

literal-char = _all **but** the following list; simpler to leave in this format until after feedback._
```
Needed to avoid syntax conflicts
```
U+0024	DOLLAR SIGN
U+003A	COLON
U+007B LEFT CURLY BRACKET
U+007C VERTICAL LINE
U+007D RIGHT CURLY BRACKET
```
Whitespace
```
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 - U+200A EN QUAD .. HAIR SPACE
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
```
Controls
```
U+0000 - U+001F
U+007F - U+009F
```
Surrogates
```
U+D800 - U+DFFF
```
Private Use
```
U+E000 - U+F8FF U+F0000 - U+FFFFD U+100000 - U+10FFFD
```
Noncharacters
```
U+FDD0 - U+FFFE U+FFFF U+1FFFE U+1FFFF U+2FFFE U+2FFFF U+3FFFE U+3FFFF
U+4FFFE U+4FFFF U+5FFFE U+5FFFF U+6FFFE U+6FFFF U+7FFFE U+7FFFF U+8FFFE
U+8FFFF U+9FFFE U+9FFFF U+AFFFE U+AFFFF U+BFFFE U+BFFFF U+CFFFE U+CFFFF
U+DFFFE U+DFFFF U+EFFFE U+EFFFF U+FFFFF U+FFFFE U+FFFFF U+10FFFE
U+10FFFF
```

### Notes
1. We could allow a non-initial ':' and '$'
2. We could reserve a few additional initial ASCII characters, eg. '!', '%', '*', '/', '?', '@'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEEDBACK] Rationalize name-char #724

This was "[FEEDBACK] Message Format Unquoted Literals #724", but was changed to focus on name-char. See later comments.

Summary

Background

Reason for reconsidering

Requirements

Detailed Proposal

OLD

NEW

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEEDBACK] Rationalize name-char #724

Description

This was "[FEEDBACK] Message Format Unquoted Literals #724", but was changed to focus on name-char. See later comments.

Summary

Background

Reason for reconsidering

Requirements

Detailed Proposal

OLD

NEW

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions