Skip to content

Define @attributes on expressions #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions spec/data-model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ interface Text {
interface Expression {
type: "expression";
body: Literal | VariableRef | FunctionRef | Unsupported;
attributes: Option[];
}
```

Expand All @@ -105,6 +106,9 @@ The `Literal` and `VariableRef` correspond to the the _literal_ and _variable_ s
When they are used as the `body` of an `Expression`,
they represent _expression_ values with no _annotation_.

The `attributes` of an `Expression` primarily represent translator hints,
but could include ones such as `locale` that do have an impact on _message_ formatting.
Comment on lines +109 to +110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remarked on this before: this is too opinionated. Just because you are thinking of translator hints at this moment does not make them them the sole or even primary reason for this feature.

I suspect we should take a moment to agree on requirements/design before adding a feature like this: we should work as efficiently as possible, but not headlong.


`Literal` represents all literal values, both _quoted_ and _unquoted_.
The presence or absence of quotes is not preserved by the data model.
The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed).
Expand Down
5 changes: 4 additions & 1 deletion spec/data-model/message.dtd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<!ATTLIST key default (true | false) "false">

<!ELEMENT pattern (#PCDATA | expression)*>
<!ELEMENT expression (literal | variable | function | unsupported)>
<!ELEMENT expression ((literal | variable | function | unsupported),attribute*)>

<!ELEMENT literal (#PCDATA)>
<!ATTLIST literal quoted (true | false) #REQUIRED>
Expand All @@ -29,3 +29,6 @@
<!ELEMENT unsupported (operand?,source)>
<!ATTLIST unsupported sigil CDATA #REQUIRED>
<!ELEMENT source (#PCDATA)>

<!ELEMENT attribute (literal | variable)>
<!ATTLIST attribute name NMTOKEN #REQUIRED>
26 changes: 14 additions & 12 deletions spec/data-model/message.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,21 @@
"kind": { "enum": ["open", "close", "value"] },
"name": { "type": "string" },
"operand": { "$ref": "#/$defs/value" },
"options": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"value": { "$ref": "#/$defs/value" }
},
"required": ["name", "value"]
}
}
"options": { "$ref": "#/$defs/options" }
},
"required": ["type", "kind", "name"]
},
"options": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"value": { "$ref": "#/$defs/value" }
},
"required": ["name", "value"]
}
},
"unsupported": {
"type": "object",
"properties": {
Expand Down Expand Up @@ -79,7 +80,8 @@
{ "$ref": "#/$defs/function" },
{ "$ref": "#/$defs/unsupported" }
]
}
},
"attributes": { "$ref": "#/$defs/options" }
},
"required": ["type", "body"]
},
Expand Down
31 changes: 28 additions & 3 deletions spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,9 @@ its resolved value is defined by _literal resolution_.
> {You have {42 :number}}
> ```

_Expression attribute resolution_ defines the handling of _expression attributes_
that have an impact on _message_ formatting.
Copy link
Member

@aphillips aphillips Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of the sentence doesn't belong. The attributes need to be resolved if they are present. Whether they have an impact on formatting is up to the functions or processing involved in the formatting itself. Also, it should be noted that the attributes almost certainly "pass down" to parts during formatting (so that the caller can access them, if needed).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about attributes getting passed as such to formatting or selection functions. If that makes sense, then why not pass them as function options?

Locale is special because

  1. It's already handled as special by existing formatters, and
  2. it may get a separate field in the formatted-parts output.

I can see how e.g. timezone could be handled similarly specially by setting up some aspect of the environment before a formatter call rather than considering it as a function option, but for that I'd expect it to be a thing that the implementation does, and which affects the formatter only indirectly.

The attributes need to be resolved if they are present.

That's not really in line with what we're saying about other ignored code. For example, we allow an implementation to ignore errors in expressions that aren't needed for the current formatting. Similarly, we should ignore errors in attributes that don't impact the formatting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attributes need to be resolved if they are present.

That's not really in line with what we're saying about other ignored code. For example, we allow an implementation to ignore errors in expressions that aren't needed for the current formatting. Similarly, we should ignore errors in attributes that don't impact the formatting.

The attributes need to be parsed, even if they are not recognized. The parser does not necessarily know what is supported downstream or in an implementation.

Similarly, we should ignore errors in attributes that don't impact the formatting.

Depends on what "error" means. If there is a syntax error, we can't ignore that, no? (some of this is about what "we" means here: It is up to the function to decide if an attribute is "bad" or just to ignore it. "Our" job is to get the options and variables to the function.)


### Literal Resolution

The resolved value of a _text_ or a _literal_ is
Expand Down Expand Up @@ -205,9 +208,13 @@ the following steps are taken:
- If its right-hand side successfully resolves to a value,
bind the _name_ of the _option_ to the resolved value in the mapping.
- Otherwise, do not bind the _name_ of the _option_ to any value in the mapping.
4. Call the function implementation with the following arguments:
4. Determine the _expression_ _locale_.
If the _expression_ has a valid `@locale` _expression attribute_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "valid" here. Is @locale=|hello world| valid? What about @locale=dead-beef (sorta BCP47-like)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's defined in the expression attribute resolution section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that "valid" isn't clearly attached to @Locale or expression attribute.

If this were written upside down, it might be clearer, e.g.

Let expressionLocale be the message locale from the formatting context.
If the expression contains a well-formed @locale expression attribute, let expressionLocale be that value.

use its resolved value as the _expression_ _locale_.
Otherwise, look up the _message_ _locale_ from the _formatting context_.
5. Call the function implementation with the following arguments:

- The current _locale_.
- The _expression_ _locale_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expose the expression attributes here also. Locale may not be the only one--I certainly have timezone in mind and other attributes may be provided by the implementation or our specification that affect the formatter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the JS Intl.DateTimeFormat, timeZone is a regular option. Should we not handle it as such?

For other attributes, there is this below:

Implementations MAY assign meaning during formatting to other expression attributes

Should the text here make a reference to that as well, or is that implicit permission sufficiently clear?

Copy link
Member

@aphillips aphillips Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other attributes, there is this below:

Implementations MAY assign meaning during formatting to other expression attributes

Should the text here make a reference to that as well, or is that implicit permission sufficiently clear?

You're permitting implementations to assign meaning, but here I'm suggesting that functions can assign meaning to attribute values, including those unknown to the implementation. They can only do this if they have access to the values, of course.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would such a value be defined in the syntax as an expression attribute rather than as a function option?

- The resolved mapping of _options_.
- If the _expression_ includes an _operand_, its resolved value.

Expand All @@ -221,7 +228,7 @@ the following steps are taken:
their access to the _formatting context_ SHOULD be minimal and read-only,
and their execution time SHOULD be limited.

5. If the call succeeds,
6. If the call succeeds,
resolve the value of the _expression_ as the result of that function call.
If the call fails or does not return a valid value,
emit a Resolution error and use a _fallback value_ for the _expression_.
Expand Down Expand Up @@ -286,6 +293,24 @@ rather than the _expression_ in the _selector_ or _pattern_.

_Pattern selection_ is not supported for _fallback values_.

### Expression Attribute Resolution

Implementations MUST support the _expression attribute_ `@locale`.
The value of `@locale` MUST be a sequence of
one or more well-formed IETF BCP47 language tags.
If the value is set by a _literal_,
the _literal_ MUST consist of a comma-delimited sequence of
well-formed IETF BCP47 language tags.

The resolved value of a valid `@locale` _expression attribute_
is a sequence of IETF BCP 47 language tags.
If a `@locale` has an invalid value,
emit a Resolution error and ignore the _expression attribute_ in further processing.
Comment on lines +307 to +308
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say that the value reverts to that of the message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, because that's implicit in what happens if an expression has no @locale attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to be careful here. Does "invalid" include the language tag's validity? That is, are we requiring that the implementation check the tag's contents? I think that's a job for e.g. Intl.Locale.

If you made the change I suggested on the previous paragraph, some of this paragraph would become irrelevant (we'd have already said that the value was a sequence of language tags). Then this paragraph could say:

Suggested change
If a `@locale` has an invalid value,
emit a Resolution error and ignore the _expression attribute_ in further processing.
When `@locale` is assigned an empty value or the resolved value consists of anything other than a list of well-formed language tags, emit a Resolution error and omit the _expression attribute_ from further processing.

Then we probably need to add somewhere:

To resolve the value of an @locale...
If the value is empty, emit an error (I think?)
If a literal, split on comma, trim, check for well-formed tag
If a variable:
- if an array...
- else if a literal...


Implementations MAY ignore all other _expression attributes_ during formatting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "ignore" mean? Not process them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

@aphillips aphillips Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is wrong. Implementations should not be required to support attributes that they don't recognize or impute meaning. So they are "ignoring" them from that perspective. But they have already parsed them. If I implement @_foo in my functions and impute meaning to whatever I assign to @_foo at the expression level, then I'd like the implementation I'm running on to pass the value to me (pretty please), even though the impl doesn't know what it means. Otherwise "private use" expression attributes can't work (as anything other than translator instructions)

Implementations MAY assign meaning during formatting to other _expression attributes_
which use a _name_ that contains at least one U+002D HYPHEN-MINUS `-` character.
Comment on lines +311 to +312
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from? Why is the hyphen important? Why can't implementations "assign meaning" to unhyphenated values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow implementations to use any attribute names, then we won't be able to introduce any additional attributes later without a danger of breaking userspace.

With the - requirement, we allow for custom attributes like @can-delete, but leave space for later MF 2.x spec versions to define other attributes.


## Pattern Selection

When a _message_ contains a _match_ construct with one or more _expressions_,
Expand Down
14 changes: 8 additions & 6 deletions spec/message.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ variant = when 1*(s key) [s] pattern
key = literal / "*"

expression = literal-expression / variable-expression / function-expression
literal-expression = "{" [s] literal [s annotation] [s] "}"
variable-expression = "{" [s] variable [s annotation] [s] "}"
function-expression = "{" [s] annotation [s] "}"
literal-expression = "{" [s] literal [s annotation] *(s expression-attribute) [s] "}"
variable-expression = "{" [s] variable [s annotation] *(s expression-attribute) [s] "}"
function-expression = "{" [s] annotation *(s expression-attribute) [s] "}"
annotation = (function *(s option)) / reserved / private-use
expression-attribute = "@" option

literal = quoted / unquoted
variable = "$" name
Expand Down Expand Up @@ -56,13 +57,14 @@ private-start = "^" / "&"
; reserve additional sigils for use by
; future versions of this specification
reserved = reserved-start reserved-body
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
reserved-start = "!" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
reserved-body = *( [s] 1*(reserved-char / reserved-escape / quoted))

reserved-char = %x00-08 ; omit HTAB and LF
/ %x0B-0C ; omit CR
/ %x0E-19 ; omit SP
/ %x21-5B ; omit \
/ %x21-3F ; omit @
/ %x41-5B ; omit \
/ %x5D-7A ; omit { | }
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF
Expand All @@ -80,7 +82,7 @@ name-char = name-start / DIGIT / "-" / "." / ":"

text-escape = backslash ( backslash / "{" / "}" )
quoted-escape = backslash ( backslash / "|" )
reserved-escape = backslash ( backslash / "{" / "|" / "}" )
reserved-escape = backslash ( backslash / "@" / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"

s = 1*( SP / HTAB / CR / LF )
75 changes: 67 additions & 8 deletions spec/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -355,18 +355,21 @@ and end with U+007D RIGHT CURLY BRACKET `}`.
An _expression_ MUST NOT be empty.

A **_<dfn>literal-expression</dfn>_** contains a _literal_,
optionally followed by an _annotation_.
optionally followed by an _annotation_
and optionally one or more _expression attributes_.

A **_<dfn>variable-expression</dfn>_** contains a _variable_,
optionally followed by an _annotation_.
optionally followed by an _annotation_
and optionally one or more _expression attributes_.

A **_<dfn>function-expression</dfn>_** contains only an _annotation_.
A **_<dfn>function-expression</dfn>_** contains only an _annotation_,
optionally followed by one or more _expression attributes_.

```abnf
expression = literal-expression / variable-expression / function-expression
literal-expression = "{" [s] literal [s annotation] [s] "}"
variable-expression = "{" [s] variable [s annotation] [s] "}"
function-expression = "{" [s] annotation [s] "}"
literal-expression = "{" [s] literal [s annotation] *(s expression-attribute) [s] "}"
variable-expression = "{" [s] variable [s annotation] *(s expression-attribute) [s] "}"
function-expression = "{" [s] annotation *(s expression-attribute) [s] "}"
annotation = (function *(s option)) / private-use / reserved
```

Expand Down Expand Up @@ -571,18 +574,74 @@ unrecognized reserved sequences have no meaning and MAY result in errors during

```abnf
reserved = reserved-start reserved-body
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
reserved-start = "!" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"

reserved-body = *( [s] 1*(reserved-char / reserved-escape / quoted))
reserved-char = %x00-08 ; omit HTAB and LF
/ %x0B-0C ; omit CR
/ %x0E-19 ; omit SP
/ %x21-5B ; omit \
/ %x21-3F ; omit @
/ %x41-5B ; omit \
/ %x5D-7A ; omit { | }
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF
```

### Expression Attributes

An **_<dfn>expression attribute</dfn>_** is a key-value pair
containing a named argument that is attached to the _expression_ as a whole.
These might be used to convey information about the _expression_
that are not necessarily applicable as _function_ _options_,
such as attributes related to handling during the translation process.
They might also be used to override contextual or implementation-specific values
used by a _function_ during formatting, such as the locale.
For example, the `@locale` attribute overrides the locale of the _message_ for its _expression_.
Attributes like `@can-delete` and `@translate` might not have any impact on formatting,
acting instead as instructions to translators.
Comment on lines +600 to +601
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax specification should only include attributes that we formally define all the way through. If these are examples, they should be treated as such, e.g. "If an implementation supported attributes such as..."


_Expression attributes_ are prefixed by a U+0040 COMMERCIAL AT `@` sign,
but otherwise use the same syntax as _options_.

The _name_ of an _expression attribute_ MUST either be one of the following,
or it MUST contain at least one U+002D HYPHEN-MINUS `-` character:

- `locale`: a sequence of well-formed IETF BCP47 language tags
see _expression attribute resolution_.
- `translate`: a literal value of either `yes` or `no`

To ensure future extensibility of this specification,
all other _expression attributes_ with a _name_ not containing `-` MUST be ignored.
Comment on lines +613 to +614
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just allowing unrecognized attributes to be ignored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then we could not later assign a meaning to them without potentially breaking user code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserving a namespace is a separate thing from ignoring unrecognized values. These can work in tandem. An MF2.0 implementation would ignore the unrecognized value @foo that MF2.1 unreserves, as well as @_bar that some other proprietary implementation is using.

Alternatively (and as I suggested elsewhere), as long as the values are syntactically correct, the implementation can just parse the values without imputing meaning to them and pass them. It only has to apply validation rules to attributes it "knows" (and do we define an error for that???)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following hypothetical, in a world where we allow implementations to assign their own meaning to any attributes not explicitly defined in the spec like @locale and @translate:

Let's say that an implementation defines @fallback to provide a value that should be used for a placeholder, should its formatting fail for whatever reason. This doesn't really make sense as a function option, because it's covering the behaviour of what happens if the function throws an error. Other implementations could just ignore this attribute, and use the spec-defined fallback representation.

Then let's say that in a 2.1 version of the MF spec we'd like to define @fallback to mean a fallback locale to use, if the function doesn't support the message locale, or the one set by @locale. This hypothetical attribute would have the same semantics as @locale, but it would append values to the language priority list rather than replacing it.

But in this world we could not do that without potentially breaking userspace, because an implementation as defined above could exist, along with a pre-existing body of messages using @fallback with a completely different meaning. In this case, the meanings of "fallback" are quite different, but even should they align, it's very likely that the detailed implementation and requirements could differ for the attributes. In other words, we would not be able to define any future attributes in the spec without a danger of breaking userspace.


Multiple _expression attributes_ are permitted in an _expression_.
Each _expression attribute_ is separated by whitespace.
If present, _expression attributes_ MUST be the last non-whitespace content in an _expression_.

```abnf
expression-attribute = "@" option
```

> Examples of _expressions_ with _expression attributes_
>
> A _message_ in English including a French _literal_ that should not be translated:
>
> ```
> {In French, "{|bonjour| @locale=fr @translate=no}" is a greeting}
> ```
>
> A _message_ with a `$date` _variable_ formatted using the `:datetime` _function_
> and overriding the locale to use French formatting:
>```
>{The date would be displayed in French as {$date :datetime @locale=fr}.}
> ```
>
> A _message_ with a `$start` _variable_ that should be kept in translations
> as the first element of the message.
>
> ```
> {{$start @can-reorder=no} is the beginning.}
> ```

## Other Syntax Elements

This section defines common elements used to construct _messages_.
Expand Down