-
-
Notifications
You must be signed in to change notification settings - Fork 36
(Design) Formatted Parts #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
8933d22
Add design doc for formatted parts
eemeli e973b3b
style: Apply Prettier
github-actions[bot] a3e6b8c
Apply suggestions from code review
eemeli cf1b5aa
Merge branch 'main' into fmt-parts-design
eemeli ac84d25
Apply suggestions from code review
eemeli 0ff1a33
Rename exploration/0003-formatted-parts.md -> exploration/formatted-p…
eemeli 2d96e77
Apply suggestions from code review
eemeli dbd626a
Add "Registry definition of formatted parts" section
eemeli a8d966f
Drop extraneous sentence
eemeli b7c88ba
Merge branch 'main' into fmt-parts-design
eemeli 1645aea
Add MessageSingleValuePart & MessageMultiValuePart definitions; other…
eemeli 0b5debd
Add note about custom expression fields
eemeli 9c59af9
Fix typo
eemeli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
# Formatted Parts | ||
|
||
Status: **Proposed** | ||
|
||
<details> | ||
<summary>Metadata</summary> | ||
<dl> | ||
<dt>Contributors</dt> | ||
<dd>@eemeli</dd> | ||
<dt>First proposed</dt> | ||
<dd>2023-08-29</dd> | ||
<dt>Pull Request</dt> | ||
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/463">#463</a></dd> | ||
</dl> | ||
</details> | ||
|
||
## Objective | ||
|
||
Messages often include placeholders that, | ||
when formatted, contain internal structure ("parts"). | ||
Preserving this structure in a formatted message | ||
may be helpful to the caller, | ||
who can then manipulate the parts. | ||
For example, a caller may want to style or present | ||
messages with the same content differently | ||
if those messages have different internal structure. | ||
|
||
This proposal defines a formatted-parts target for MessageFormat 2. | ||
|
||
## Background | ||
|
||
Past examples have shown us that if we don't provide a formatter to parts, | ||
the string output will be re-parsed and re-processed by users. | ||
Recent examples of web browsers needing to account for such user behaviour are available from | ||
[June 2022](https://github.com/WebKit/WebKit/commit/1dc01f753d89a85ee19df8e8bd75f4aece80c594) and | ||
[November 2022](https://bugs.chromium.org/p/v8/issues/detail?id=13494). | ||
|
||
## Use-Cases | ||
|
||
- Markup elements | ||
- Non-string values | ||
- Message post-processors | ||
- Decoration of placeholder interior parts. | ||
For example, identifying the separate fields in these two currency values | ||
(notice that the symbol, number, and fraction fields | ||
are not in the same order and that the separator has been omitted): | ||
 | ||
 | ||
- Supplying bidirectional isolation of placeholders, | ||
such as by using HTML's `span` element with a `dir` attribute | ||
based on the direction of the placeholder. | ||
|
||
## Requirements | ||
|
||
- Define an iterable sequence of formatted part objects. | ||
- Include metadata for each part, such as type, source, direction, and locale. | ||
- Allow the representation of non-string values. | ||
- Allow the representation of values that consist of an iterable sequence of formatted parts. | ||
- Be able to represent each resolved value of a pattern with any number of formatted parts, including none. | ||
- Define the formatted parts in a manner that allows synonymous but appropriate implementations in different programming languages. | ||
|
||
## Constraints | ||
|
||
- The JS Intl formatters already include formatted-parts representations for each supported data type. | ||
The JS implementation of the MF2 formatted-parts representation should be able to match their structure, | ||
at least as far as that's possible and appropriate. | ||
|
||
## Proposed Design | ||
|
||
The formatted-parts API is included in the spec as an optional but recommended formatting target. | ||
|
||
The shape of the formatted-parts output is defined in a manner similar to the data model, | ||
which includes TypeScript, JSON Schema, and XML DTD definitions of the same data structure. | ||
|
||
At the top level, the formatted-parts result is an iterable sequence of parts. | ||
Parts corresponding to each _text_ can be simpler than those of _expressions_, | ||
as they do not have a `source` other than their `value`, | ||
or set any of the other possible metadata fields. | ||
|
||
```ts | ||
type MessageParts = Iterable< | ||
MessageTextPart | MessageExpressionPart | MessageBiDiIsolationPart | ||
>; | ||
|
||
interface MessageTextPart { | ||
type: "text"; | ||
value: string; | ||
} | ||
``` | ||
|
||
For MessageExpressionPart, the `source` corresponds to the expression's fallback value. | ||
The `dir` and `locale` attributes of a part may be inherited from the message | ||
or from the operand (if present), | ||
or overridden by an expression attribute or formatting function, | ||
or otherwise set by the implementation. | ||
Each part should have at most one of `value` or `parts` defined; | ||
some may have none. | ||
|
||
```ts | ||
type MessageExpressionPart = | ||
| MessageSingleValuePart<string, unknown> | ||
| MessageMultiValuePart<string, unknown>; | ||
|
||
interface MessageSingleValuePart<T extends string, V> { | ||
type: T; | ||
source: string; | ||
dir?: "ltr" | "rtl" | "auto"; | ||
locale?: string; | ||
value?: V; | ||
} | ||
|
||
interface MessageMultiValuePart<T extends string, V> { | ||
type: T; | ||
source: string; | ||
dir?: "ltr" | "rtl" | "auto"; | ||
locale?: string; | ||
parts: Iterable<{ type: string; value: V; source?: string }>; | ||
} | ||
``` | ||
|
||
The bidi isolation strategies included in the spec may require | ||
the insertion of MessageBiDiIsolationParts in the formatted-parts output. | ||
|
||
```ts | ||
interface MessageBiDiIsolationPart { | ||
type: "bidiIsolation"; | ||
value: "\u2066" | "\u2067" | "\u2068" | "\u2069"; // LRI | RLI | FSI | PDI | ||
} | ||
``` | ||
|
||
Some of the MessageExpressionPart instances may be further defined | ||
without reference to the function registry. | ||
|
||
Unannotated expressions with a _literal_ operand | ||
are represented by MessageStringPart. | ||
As with MessageTextPart, | ||
the `value` of MessageStringPart is always a string. | ||
|
||
```ts | ||
interface MessageStringPart { | ||
// MessageSingleValuePart<"string", string> | ||
type: "string"; | ||
source: string; | ||
value: string; | ||
dir?: "ltr" | "rtl" | "auto"; | ||
locale?: string; | ||
} | ||
``` | ||
|
||
Unannotated expressions with a _variable_ operand | ||
whose type is not recognized by the implementation | ||
or for which no default formatter is available | ||
are represented by MessageUnknownPart. | ||
|
||
```ts | ||
interface MessageUnknownPart { | ||
// MessageSingleValuePart<"unknown", unknown> | ||
type: "unknown"; | ||
source: string; | ||
value: unknown; | ||
} | ||
``` | ||
|
||
When the resolution or formatting of a placeholder fails, | ||
it is represented in the output by MessageFallbackPart. | ||
No `value` is provided; when formatting to a string, | ||
the part's representation would be `'{' + source + '}'`. | ||
|
||
```ts | ||
interface MessageFallbackPart { | ||
// MessageSingleValuePart<"fallback", never> | ||
type: "fallback"; | ||
source: string; | ||
} | ||
``` | ||
|
||
### Registry definition of formatted parts | ||
|
||
Each function defined in the registry MUST define its "formatted-parts" representation. | ||
A function can define either a unitary string `value` or a `parts` representation. | ||
Where possible, a function SHOULD provide a `parts` representation | ||
if its output might reasonably consist of multiple fields. | ||
In most cases, these sub-parts should not need fields beyond their `type` and a string `value`. | ||
Where necessary, other `value` types may be used | ||
and other fields such as a `source` included in the sub-parts, | ||
and additional fields may be included in the `MessageExpressionPart`. | ||
|
||
For example, `:datetime` and `:number` formatters could use the following formatted-parts representations. | ||
In many implementations, these could be further narrowed to only use `string` values. | ||
|
||
```ts | ||
interface MessageDateTimePart { | ||
// MessageMultiValuePart<"datetime", unknown> | ||
type: "datetime"; | ||
source: string; | ||
parts: Iterable<{ type: string; value: unknown }>; | ||
dir?: "ltr" | "rtl" | "auto"; | ||
locale?: string; | ||
} | ||
|
||
interface MessageNumberPart { | ||
// MessageMultiValuePart<"number", unknown> | ||
type: "number"; | ||
source: string; | ||
parts: Iterable<{ type: string; value: unknown }>; | ||
dir?: "ltr" | "rtl" | "auto"; | ||
locale?: string; | ||
} | ||
``` | ||
|
||
## Alternatives Considered | ||
|
||
### Not Defining a Formatted-Parts Output | ||
|
||
Leave it to implementations. | ||
They will each come up with something a bit different, | ||
but each will mostly work. | ||
|
||
They will not be interoperable, though. | ||
|
||
### Different Parts Shapes | ||
|
||
See issue <a href="https://github.com/unicode-org/message-format-wg/issues/41">#41</a> for details. | ||
|
||
They can be considered as precursors of the current proposal, | ||
into which they've developed due to evolutionary pressure. | ||
|
||
### Annotated String Output | ||
|
||
Format to a string, but separately define metadata or other values. | ||
|
||
This gets really clunky for parts that are not reasonably stringifiable. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.