Skip to content

Localization Units Formatting #118

Closed
@zbraniecki

Description

@zbraniecki

This is a complete braindump of my late night revelation that may be genius, crazy, foolish or any combination of those.

Background

It started with realization that the irk I have with the name of our group overlaps with the irk that Mihai expressed, but for different reasons. Mihai said "I think we may come up with something very different than MF 1.0, so naming it 2.0 is misleading and may implicitly steer us toward trying to salvage MF similiarity for compatibility reasons which may be a sunk cost fallacy" (paraphrase mine).

I reacted positively to that, because I recognize that there is a natural drift to "add to MF 1.0" just like I may have a drift to "bring Fluent to MF 2.0", and I think it may be limiting us in designing the optimal solution.

But as I dug deeper I realized that the concern I have is with the word "Message". The fact that we talk about formatting messages is already misaligned with how I think modern UI localization mental models should work.

For a simple textual app, you can have something like:

printf("You have 5 new messages.\n");

and MessageFormat 1.0 contains data model, syntax, logic and API to internationalize this line of code.

But UI paradigms are fundamentally different.

Let me give you an example:

Example

dialog-boxes-messagebox-default-button

What does it mean to localize it? What is the "message" and what do we mean by "formatting" it in such context?

There's definitely going to be some formatting going on, there are 4 strings in this widget, and an icon, but what is the "message"?

Well, you can decompose this widget into four separate widgets (title, label, button-ok, button-cancel) and try to say "each one of those has a value and that value is a message!", and I believe that's the most common model of approaching it.

But it doesn't scale in so many ways:

  1. If there's a relation between the message and buttons (see Welsh where there is no generic yes/no, and a label for the button has to depend on the message it answers to question), we lost it
  2. If there's any meta information about the widget, or its localization, it is now decomposed into four independent messages
  3. If there is any behavior to between localization and widget, we need to perform it four times, one per message
  4. If there are any arguments that are required to localize this widget, we need to send them to four messages
  5. If we'd want the UI toolkit to plug "localize" step before layout/paint, we need to write some code that formats those 4 messages and applies them onto that widget
  6. Is the icon a fifth message? It may flip in RTL contexts, and icons may contain text or culturally specific graphics that may have to be part of the localization of this widget.
  7. What if the button-ok, button-cancel, icon, label or the whole modal window have tooltips?
  8. What if they have accesskeys?
  9. What happens when there's any error in applying localization onto this widget? Are we falling back onto another locale? For one of two buttons? For label but not for buttons? How do we reconcile?
  10. Is localization of the button synchronous, or asynchronous? If there's fallback, which may require I/O for resources, is it synchronous or asynchronous? How does the binding function for the widget to apply those 4-5-10 messages onto it look like?
  11. Can you retranslate this widget to a different locale during UI lifetime, or do you have to recreate it in a different locale, remove the old one, add new one? If so, are you losing event bindings and state?
  12. Can you cache the state of this element pre-localization, post-localization, can you invaidate cache of this widget if while loading you realize that translation is obsolete?
  13. If the widget text is more complicated - if it's a paragraph of text, with images, stylistic annotations, or smart sentences like "Refresh the page every 5 minutes" where 5 is actually a numerical text input, or select dropdown, or your text for this widget is a list of items where the structure and number of items should be controlled by the localizer. How do you handle that when you are merely formatting a single string and you don't have a notion that it is part of a UI that is a nested tree structure with attributes, events, text, icons and data?

Two topics, that are intertwined but separate

I recognize that there are two topics here, my last question is from a bit different category.

  1. Do we want to support localization of UI elements/widgets which are usually much more compound than a single string
  2. Do we want to support localization of messages that have semantic fragments inside them

I believe that the questions are related, because they relate to breaking with the idea that a message is a string and a UI is a list of messages.
In this model, UI is a tree (not list!) of compound widgets, each having multiple strings inside it, and each string may have its own UI fragment inside it.

Both of those issues are rooted in how UI is different from plain text, but we should imho treat those two questions separately and be open to having different solutions, or even considering one in scope, and another out of scope.

I'm bringing them up here because I want to challenge us with thinking about end-to-end localization of UI, and then you need to consider both.

How to design it?

Designing that system is actually very tricky if you stick to thinking of localization step of the UI toolkit as taking messages (strings), formatting them, and then applying in correct positions in the UI widget.
You need a lot of boilerplate code that has to either be controlled by the developer writing the code, or by the widget code, or by the toolkit and in each case is non trivial, hard to handle sync/async, limits fallbacks and, I will argue, ...

misses the point.

Localization Unit

Because you cannot localize a compound nested, rich User Interface widgets by formatting "messages".
You need a concept that is broader than a single string - something I started calling in my mind "Localization Unit".
This of all the data needed to localize the above example:

hello-prompt  = {
    "meta": {
      "role": "modal window",
      "description": "..."
    }
    "elements": {
         "label": ["Hello, ", Element("strong", [Argument("userName")]), "!"],
         "button-ok": {
           "label": "Ok", //  In Welsh `[Reference("self", "label"), "lorem ipsum"]`
           "accesskey": "O",
           "tooltip": "Click to accept"
         },
         "button-cancel": {
           "label": "Cancel",
           "accesskey": "C",
           "tooltip": "Click to reject"
         },
         "close-icon": {
           "tooltip": "Close the prompt"
         },
         "main-icon": {
           "url": "@icon-path",
           "aria-label": "Question mark icon"
         }
    }
}

And once you have it, you can do the most natural thing: you can bind such UI element to a corresponding localization unit.

<prompt
  l10n-id="hello-prompt"
  l10n-args="{userName: 'John'}"
>

or:

prompt.l10n.id = "hello-prompt";
prompt.l10n.args.set("userName", "John");

Such binding is declarative, just like applying a CSS class onto an element is, and it allows the engine to understand that before layout and painting steps for this element some resources need to be retrieved, their Localization Units must be resolved and the combination of the element and its localization unit is what gets laid out and painted.

This model has a huge number of benefits:

  • Localization Unit may be nested
  • it may have multiple messages and icons and other data inside it
  • its shape corresponds to the shape of the UI element it is meant to localize
  • it has meta data associated with the unit
  • it has a fallback that is reasonable and operates per-widget
  • the toolkit can apply the localization unit, reapply it, remove it, and modify when it needs to because all the information needed to localize the element is in the annotation for that element
  • tooling understands that this unit is a compound structure that doesn't pretend to be "flat" and "printf with params"
  • CAT tools can reason about how the element looks like, and even pull such element and WYSIWYG apply translation onto it as the localizer is translating it.
  • data and contexts are relevant to the widget, rather than pretending that a tooltip of button-ok is a standalone message
  • developer writes code synchronously just annotating the UI which is a synchronous operation and allowing UI toolkit to react to changes in the animation frame cycle

LocalizationUnitFormatter

In ICU we actually already have a notion of such intermediate representation of data - FormattedX. For example, DateTimeFormatter produces FormattedDateTime which has a lot of information allowing users to introspect, operate and maybe even manipulate formatted data. The user can also just toString() it to get the result.

What if we had LocalizationFormatter which has a format method that returns FormattedLocalizationUnit which has all the information needed for a UI toolkit to combine it with Label, MenuItem or Button or any other widget and produce a LocalizedElement or LocalizedWidget that will be then laid out and painted?

And for the imperative case, we could still have toString which would take the value of the LocalizationUnit if it has one, and just print it as a string for the familiar printf experience.

What's in scope?

I don't know yet. It's kind of a fresh realization and I'm not sure if my recommendation for the group is to:

a) Consider Localization Unit in scope as a level above MessageFormatter.
b) Consider Localization Unit out of scope, but the right paradigm for UI localization and therefore work on having MessageFormat 2.0 be a good lower level API for it
c) Consider Localization Unit one of many paradigms for UI localization and not tie our work to it
d) Consider Localization Unit a bad paragidm and design a better one

Why am I raising it?

The reason I think it is important is that we need to early on decide whether what our target is does:

printf("Hello, { $user }");
Label.textContent = format("Hello, { $user } ");

and we are ok thinking of the receiving end as flat textual strings, or do we want to embrace that fact that this is not how UI localization is today.
That Label may have multiple attributes, and icons, and other values and each one may be a nested structure of data and localization may bring its own UI fragments that need to be overlapped with source fragments.
That the function in which you call printf is not the right place to synchronously annotated the UI with a string, because then the toolkit doesn't know that the UI is localized, cannot retranslate, cannot cache, cannot invalidate that cache, and cannot have responsive localization.

I think that decisions around it will have deep consequences for our thinking about many items on our wishlist (#3)

  • what is the data model of the single Message (Extendable inline markup #26)
  • how/if we want to allow inter-message references, inter-message data, inter-message meta-data (Support variable info not in message patterns #98)
  • do we want to allow association of non-text with our Messages (or Localization Unit containing Messages and NonMessages)
  • how fallback works in UI setup? Is it performed synchronously at callsite? Is every callsite asynchronous? Do we need to resolve all messages before we being applying translations? (Specify error fallbacking #45)
  • Can we have responsive localization (localization that reacts to locale changes, or argument changes) (Responsive localization #65)

I wrote a separate comment for Raph's new UI toolkit paragidm over last day of wrangling with this concept. If you're interested in more particular tangible application of how it may look like, consider reading raphlinus/crochet#7

Metadata

Metadata

Assignees

No one assigned

    Labels

    resolve-candidateThis issue appears to have been answered or resolved, and may be closed soon.syntaxIssues related with syntax or ABNF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions