diff --git a/docs/ldml/tr35-personNames.md b/docs/ldml/tr35-personNames.md index 6f68e0b7f2c..2b345d00a50 100644 --- a/docs/ldml/tr35-personNames.md +++ b/docs/ldml/tr35-personNames.md @@ -67,14 +67,19 @@ The LDML specification is divided into the following parts: * 6 [Formatting Process](#formatting-process) * 6.1 [Derive the name locale](#derive-the-name-locale) * 6.2 [Derive the formatting locale](#derive-the-formatting-locale) + * 6.2.1 [Switch the formatting locale if necessary]("switch-formatting-locales) * 6.3 [Derive the name order](#derive-the-name-order) * 6.4 [Choose a personName](#choose-a-personname) - * 6.5 [Choose a namePattern](#choose-a-namepattern) - * 6.6 [Examples of choosing a namePattern](#examples-of-choosing-a-namepattern) - * 6.6.1 [Examples for rules 1 and 2](#examples-for-rules-1-and-2) - * 6.6.2 [Examples for rule 3 and the interaction between the rules](#examples-for-rule-3-and-the-interaction-between-the-rules) - * 6.7 [Deriving initials](#deriving-initials) - * 6.8 [Handling foreign names](#handling-foreign-names) + * 6.5 [Choose a namePattern](#choose-a-namepattern") + * 6.6 [Access PersonName object](#access-personname") + * 6.6.1 [Handle missing surname](#missing-surname") + * 6.6.2 [Handle core and prefix](#core-and-prefix) + * 6.6.3 [Derive initials](#derive-initials) + * 6.7 [Process a namePattern](#process-the-namepattern) + * 6.7.1 [Handling foreign names](#handling-foreign-names) + * 6.7.2 [Setting the spaceReplacement](#spaceReplacement) + * 6.7.3 [Examples of space replacement](#spaceReplacement-examples) + * 6.8 [Formatting examples](#formatting-examples) * 7 [Sample Name](#sample-name) * 7.1 [Syntax](#syntax) * 7.2 [Expected values](#expected-values) @@ -256,7 +261,7 @@ The `` element is optional, and contains information about sel ``` -* `#PCDATA `is a space delimited list of one or more [unicode_locale_id](https://unicode.org/reports/tr35/#unicode_locale_id)s. Normally they are limited to language, script, and region. The _und_ locale ID may only occur once, either in _surnameFirst_ or _givenFirst_, but not both, and matches all base locales not explicitly listed. +* `#PCDATA `is a space delimited list of one or more [unicode_locale_id](https://unicode.org/reports/tr35/#unicode_locale_id)s. Normally each locale is limited to language, script, and region. The _und_ locale ID may only occur once, either in _surnameFirst_ or _givenFirst_, but not both, and matches all base locales not explicitly listed. An example from English may look like the following @@ -267,7 +272,7 @@ This would tell the formatting code, when handling person name data from an Engl ### 2.4 foreignSpaceReplacement Element -The `` element is used to specify how delimiters should appear between name parts when the name data (name locale) is different from the requested locale (formatting locale)., but they both use the same script. +The `` element is used to specify how spaces should be handled when the name language is different from the formatting language. ```xml @@ -275,7 +280,7 @@ The `` element is used to specify how delimiters should ``` * `xml:space` must be set to `'preserve'` so that actual spaces in the pattern are preserved. See [W3C XML White Space Handling](https://www.w3.org/TR/xml/#sec-white-space). -* The `#PCDATA `is the character sequence used to replace spaces between fields for name data from a name locale that is different from the formatting locale, but are in the same script. +* The `#PCDATA `is the character sequence used to replace spaces when postprocessing a pattern. ### 2.5 initialPattern Element @@ -475,41 +480,46 @@ Examples: ## 6 Formatting Process -The patterns are in personName elements, which are themselves in a personNames container element. The following describes how these patterns are chosen. If the name locale is different than the formatting locale, then additional processing needs to take place: see [Handling foreign names](#handling-foreign-names). +The patterns are in personName elements, which are themselves in a personNames container element. The following describes how the the formatter's locale interacts with the personName's locale, how the name patterns are chosen, and how they are processed. The details of the XML structure behind the data referenced here are in [XML Structure](#xml-structure). +The formatting process may be refined in the future. In particular, additional data may be added to allow further customization. + ### 6.1 Derive the name locale -Create a **full name locale** as follows. +Construct the **name script** in the following way. +1. Iterate through the characters of the surname, then through the given name. + 1. Find the script of that character using the Script property. + 2. If the script is not Common, Inherited, nor Unknown, return that script as the **name script** +2. If nothing is found during the iteration, return Zzzz (Unknown Script) -1. First, let the **full formatting locale** be the fully-fleshed-out formatting locale using likely subtags. -2. If there is a name locale available via the PersonName data interface, obtain the full name locale from the name locale using likely subtags. Thus de ⇒ de_Latn_DE. -3. Otherwise the full name locale is created based on the characters in the name and the full formatting locale, as follows: - 1. Find the predominant script for the name in the following way. - 1. For each character in the given and surname, find the script(s) of the character using the Script_Extensions property. - 2. For each of those scripts, increment a counter for that script, and record the position of the first character encountered in that script. - 2. The predominant script is the script with the highest counter value. - 1. In the rare case that there are multiple counters with the highest counter value, take the one with the lowest first position. - 2. In the even rarer case that there is still more than one, use the script whose script code is alphabetically lowest. (These two steps are simply to guarantee a determinant result.) - 3. If the predominant script is the same as the script of the full formatting locale, then let the full name locale be the full formatting locale. - 4. Otherwise, find the likely locale for the predominant script, as specified by the likely subtags. (This will add a language and region.) Let the full name locale be that likely locale. +Construct the **name base language** in the following way. +1. If the PersonName object can provide a name locale, return its language. +2. Otherwise, find the maximal likely locale for the name script, using Likely Subtags, and return its base language (first subtag). -In all steps below, the "name locale" is the full name locale. +Construct the **name locale** in the following way: +1. If the PersonName object can provide a name locale, return a locale formed from it by replacing its script by the name script. +2. Otherwise, return the locale formed from the name base langauge plus name script. ### 6.2 Derive the formatting locale -If the full name locale is different from the full formatting locale, and the predominant script of the name is different from the script of the formatting locale, then let the full formatting locale be the full name locale. +Let the **full formatting locale** be the maximal likely locale for the formatter's locale, using Likely Subtags. The **formatting base language** is the base language (first subtag) of the full formatting locale, and the **formatting script** is the script code of the full formatting locale. + +#### 6.2.1 Switch the formatting locale if necessary -In all steps below, the "formatting locale" is the full formatting locale. +A few script values represent a set of scripts, such as Jpan = {Hani, Kana, Hira}. Two script codes are said to _match_ when they are either identical, or one represents a set which contains the other, or they both represent sets which intersect. For example, Hani and Jpan match, because {Hani, Kana, Hira} contains Hani. + +If the name script doesn't match the formatting script, then the name is formatted with the name locale, not the originally requested formatting locale. For example, when a Hindi formatter (hi-Deva) is called upon to format a name with the Ukrainian (Cyrillic) locale (uk-Cyrl), under the covers a Ukrainian (Cyrillic) formatter should be instantiated and used to format that name. ### 6.3 Derive the name order A PersonName object’s fields are used to derive an order, as follows: -1. If the PersonName object to be formatted has a `preferredOrder` field, then return that field’s value -2. Otherwise use the nameOrderLocales elements to find the most best match for the full name locale, as follows. - 1. For each locale L1 in the parent locale lookup chain* for the full name locale, do the following +1. If the calling API requests sorting order, that is used. +2. Otherwise, if the PersonName object to be formatted has a `preferredOrder` field, then return that field’s value +2. Otherwise, use the nameOrderLocales elements to find the most best match for the name locale, as follows. + 1. For each locale L1 in the parent locale lookup chain* for the name locale, do the following 1. Create a locale L2 by replacing the language subtag by 'und'. (Eg, 'de_DE' ⇒ 'und_DE') 2. For each locale L in {L1, L2}, do the following 1. If there is a precise match among the givenFirst nameOrderLocales for L, then let the nameOrder be givenFirst, and stop. @@ -523,14 +533,14 @@ In other words, you'll check the givenFirst and surnameFirst resources for the f de_Latin_DE, und_Latn_DE, de_Latn, und_Latn, de_DE, und_DE, de, und -This process will always terminate, because there is always a und value in one of the two nameOrderLocales elements. +This process will always terminate, because there is always a und value in one of the two nameOrderLocales elements. Remember that the inheritance chain requires use of the parentLocales elements: it is not just truncation. For example, the data for a particular locale might look like the following: ```xml zh ja und-CN und-TW und-SG und-HK und-MO und-HU und-JP ``` -The nameOrderLocales will match any locale with a zh or ja [unicode_language_subtag](https://unicode.org/reports/tr35/#unicode_language_subtag) and any locale with a CN, TW, SG, HK MO, HU, or JP [unicode_region_subtag](https://unicode.org/reports/tr35/#unicode_region_subtag). +These nameOrderLocales will match any locale with a zh or ja [unicode_language_subtag](https://unicode.org/reports/tr35/#unicode_language_subtag) and any locale with a CN, TW, SG, HK MO, HU, or JP [unicode_region_subtag](https://unicode.org/reports/tr35/#unicode_region_subtag). Here are some more examples. Note that if there is no order field or locale field in the PersonName object to be formatted, and the script of the PersonName data is different from that of the formatting locale, then the default result is givenFirst. @@ -589,9 +599,7 @@ To match a personName, all four attributes in the personName must match (a missi To find the matching personName element, traverse all the personNames in order until the first one is found. This will always terminate since the data is well-formed in CLDR. -### 6.5 Choose and process a namePattern - -#### 6.5.1 Choose a namePattern +### 6.5 Choose a namePattern To format a name, the fields in a namePattern are replaced with fields fetched from the PersonName Data Interface. The personName element can contain multiple namePattern elements. Choose one based on the fields in the input PersonName object that are populated: 1. Find the set of patterns with the most populated fields. @@ -607,144 +615,36 @@ For example: 3. Pattern C is discarded, because it has the least number of populated name fields. 4. Out of the remaining patterns A and B, pattern B wins, because it has only 3 unpopulated fields compared to pattern A. -#### 6.5.1 Choose a namePattern - -If the “winning” namePattern still has fields that are unpopulated in the PersonName object, we process the pattern algorithmically with the following steps: - -1. If one or more fields at the start of the pattern are empty, all fields and literal text before the **first** populated field are omitted. -2. If one or more fields at the end of the pattern are empty, all fields and literal text after the **last** populated field are omitted. -3. Processing from the start of the remaining pattern: - 1. If there are two or more empty fields separated only by literals, the fields and the literals between them are removed. - 2. If there is a single empty field, it is removed. -4. If the processing from step 3 results in two adjacent literals (call them A and B), they are coalesced into one literal as follows: - 1. If either is empty the result is the other one. - 3. If B matches the end of A, then the result is A. So xyz + yz ⇒ xyz, and xyz + xyz ⇒ xyz. - 4. Otherwise the result is A + B, but with any sequence of two or more white space characters replaced by the first whitespace character. - -### 6.6 Examples of choosing a namePattern - -#### 6.6.1 Examples for rules 1 and 2 - -The personName element contains: - - -> `{title} {given} {given2} {surname}, {credentials}` - - -The input PersonName object contains: - -| `title` | `given` | `given2` | `surname` | `generation` | -| -------- | ------- | -------- | --------- | -------- | -| | Raymond | J. | Johnson | Jr. | - -The output is: - -> Raymond J. Johnson, Jr. - -The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above. - -If, instead, the input PersonName object contains: - -| `title` | `given` | `given2` | `surname` | `generation` | -| -------- | ------- | -------- | --------- | -------- | -| | Raymond | J. | Johnson | | - -The output is: - -> Raymond J. Johnson - -The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above. - -The “generation” field is also empty, so it and both the comma and the space that precede it are omitted from the output, according to rule 2 above. - -#### 6.6.2 Examples for rule 3 and the interaction between the rules - -To see how rule 3 interacts with the other rules, consider an imaginary language in which people generally have given and given2 (or middle) names, and the given2 name is always written with parentheses around it, and the given name is usually written as an initial with a following period. - -The personName element contains: - -> `{given-initial}. ({given2}) {surname}` - +### 6.6 Access PersonName object +### 6.6.1 Handle missing surname -The input PersonName object contains: +All PersonName objects will have a given name (for mononyms the given name is used). However, there may not be a surname. In that case, the following process is followed so that formatted patterns produce reasonable results. -| `given` | `given2` | `surname` | -| ------- | -------- | --------- | -| Foo | Bar | Baz | +1. If there is no surname from a PersonName P1 _and_ the pattern either doesn't include the given name or only shows an initial for the given name, then: + 2. Construct and use a derived PersonName P2, whereby P2 behaves exactly as P1 except that: + 1. any request for a surname field (with any modifiers) returns P1's given name (with the same modifers) + 2. any request for a given name field (with any modifiers) returns "" (empty string) -The output is: +As always, this is a logical description and may be optimized in implementations. For example, an implemenation may use an interface for P2 that just delegates calls to P1, with some redirection for accesses to surname and given name. -> F. (Bar) Baz +### 6.6.2 Handle core and prefix -If, instead, the input PersonName object contains: +A given field may have a core value, a prefix value, and/or a ‘plain’ value (neither core nor prefix). If one or more of them are missing, then the returned values should be adjusted according to the table below. In each cell, a ✓ indicates that a value is available, an ✖️ if there is none, and a → indicates when a value is substituted. -| `given` | `given2` | `surname` | -| ------- | -------- | --------- | -| Foo | | Baz | +| prefix | core | plain | +| ------ | ---- | ----- | +| ✓ | ✓ | ✓ | +| ✓ | ✓ | ✖️ → prefix + " " + core | +| ✓ → ✖️ | ✖️ → plain | ✓ | +| ✓ → ✖️ | ✖️ | ✖️ | +| ✖️ | ✓ → plain | ✓ | +| ✖️ | ✓ | ✖️ → core | +| ✖️ | ✖️ → plain | ✓ | +| ✖️ | ✖️ | ✖️ | -The output is: +For example, if the surname-prefix is "von und zu" and the surname-core is "Stettbach" and there is no surname (plain), then the derived value for the surname is "von und zu Stettbach". (The cases where existing values are changed for prefix and core (ie, ✓ → …) should not be necessary with well-formed PersonName data.) -> F. Baz - -The “given2” field is empty, so it and the surrounding parentheses are omitted from the output, as is one of the surrounding spaces, according to rule 3. The period after “{given-initial}” remains, because it is separated from the “{given2}” element by space-- punctuation around a missing field is only deleted up until the closest space in each direction. - -If there were no space between the period and the parentheses, as might happen if our hypothetical language didn’t use spaces: - -> `{given-initial}.({given2}) {surname}` - -The input PersonName object still contains: - -| `given` | `given2` | `surname` | -| ------- | -------- | --------- | -| Foo | | Baz | - -The output is: - -> F Baz - -Both the period after “{given-initial}” _and_ the parentheses around “{given2}” are omitted from the output, because there was no space between them — instead, we delete punctuation all the way up to the neighboring field. To solve this (making sure the “{given-initial}” field always has a period after it), you would add another namePattern: - -> `{given-initial}.({given2}) {surname}`
-> `{given-initial}. {surname}` - -The first pattern would be used when the “given2” field is populated, and the second pattern would be used when the “given2” field is empty. - -Rules 1 and 3 can conflict in similar ways. If the personName element contains (there’s a space between the period and the opening parenthesis again): - -> `{given-initial}. ({given2}) {surname}` - -And the input PersonName object contains: - -| `given` | `given2` | `surname` | -| ------- | -------- | --------- | -| | Bar | Baz | - -The output is: - -> Bar) Baz - -Because the “given” field is empty, rule 1 not only has us delete it, but also all punctuation up to “{given2}”. This includes _both_ the period _and_ the opening parenthesis. Again, to solve this, you’d supply two namePatterns: - -> `{given-initial}. ({given2}) {surname}`
-> ` ({given2}) {surname}` - -The output would then be: - -> (Bar) Baz - -The first namePattern would be used if the “given” field was populated, and the second would be used if it was empty. - -If, instead, the input PersonName object contains: - -| `given` | `given2` | `surname` | -| ------- | -------- | --------- | -| Foo | | Baz | - -The output is: - -> F. Baz - -### 6.7 Deriving initials +### 6.6.3 Derive initials The following process is used to produce initials when they are not supplied by the PersonName object. Assuming the input example is “Mary Beth”: @@ -757,47 +657,68 @@ The following process is used to produce initials when they are not supplied by See the “initial” modifier in the [Modifiers](#modifiers) section for more details. -### 6.8 Handling foreign names +#### 6.7 Process a namePattern -There are two main challenges in dealing with foreign name formatting that needs to be considered. One is the ordering, which is dealt with under the section [[2.3 nameOrderLocales Element](#nameorderlocales-element)]. The other is spacing. +The “winning” namePattern may still have fields that are unpopulated (empty) in the PersonName object. That namePattern is populated with field values with the following steps: -Some writing systems require spaces (or some other non-letters) to separate words. For example, [Hayao Miyazaki](https://en.wikipedia.org/wiki/Hayao_Miyazaki) is written in English with given name first and with a space between the two name fields, while in Japanese there is no space with surname first: [宮崎駿](https://ja.wikipedia.org/wiki/%E5%AE%AE%E5%B4%8E%E9%A7%BF) +1. If one or more fields at the start of the pattern are empty, all fields and literal text before the **first** populated field are omitted. +2. If one or more fields at the end of the pattern are empty, all fields and literal text after the **last** populated field are omitted. +3. Processing from the start of the remaining pattern: + 1. If there are two or more empty fields separated only by literals, the fields and the literals between them are removed. + 2. If there is a single empty field, it is removed. +4. If the processing from step 3 results in two adjacent literals (call them A and B), they are coalesced into one literal as follows: + 1. If either is empty the result is the other one. + 2. If B matches the end of A, then the result is A. So xyz + yz ⇒ xyz, and xyz + xyz ⇒ xyz. + 3. Otherwise the result is A + B, further modified by replacing any sequence of two or more white space characters by the first whitespace character. +5. All of the fields are replaced by the corresponding values from the PersonName object. -Those locales are determined using the Script Metadata as follows. -1. Get the likely script of the formatting locale, using LikelySubtags -2. If the likely script is Thai, it needs spaces. -3. Otherwise, if the likely script is Jpan, Hant, or Hans, it doesn't need spaces -4. Otherwise, if the likely script has the script metadata property lbLetters = YES, it doesn't need spaces -5. Otherwise, it needs spaces +### 6.7.1 Handling foreign names + +There are two main challenges in dealing with foreign name formatting that needs to be considered. One is the ordering, which is dealt with under the section [[2.3 nameOrderLocales Element](#nameorderlocales-element)]. The other is spacing. -Then: +Some writing systems require spaces (or some other non-letters) to separate words. For example, [Hayao Miyazaki](https://en.wikipedia.org/wiki/Hayao_Miyazaki) is written in English with given name first and with a space between the two name fields, while in Japanese there is no space with surname first: [宮崎駿](https://ja.wikipedia.org/wiki/%E5%AE%AE%E5%B4%8E%E9%A7%BF) -1. If a locale requires spaces between words, the normal patterns for the formatting locale are used. On Wikipedia, for example, note the space within the Japanese name on pages from English and Korean (an ideographic space is used here for emphasis). +If a locale requires spaces between words, the normal patterns for the formatting locale are used. On Wikipedia, for example, note the space within the Japanese name on pages from English and Korean (an ideographic space is used here for emphasis). * “​​[Hayao Miyazaki (宮崎 駿, Miyazaki Hayao](https://en.wikipedia.org/wiki/Hayao_Miyazaki)…” or * “[미야자키 하야오(일본어: 宮﨑 駿 Miyazaki Hayao](https://ko.wikipedia.org/wiki/%EB%AF%B8%EC%95%BC%EC%9E%90%ED%82%A4_%ED%95%98%EC%95%BC%EC%98%A4)…”. -2. If a locale **doesn’t** require spaces between words, there are two cases, based on whether the name is foreign or not (based on the PersonName objects explicit or calculated locale's language subtag). For example, the formatting locale might be Japanese, and the locale of the PersonName object might be de_CH, German (Switzerland), such as Albert Einstein. +If a locale **doesn’t** require spaces between words, there are two cases, based on whether the name is foreign or not (based on the PersonName objects explicit or calculated locale's language subtag). For example, the formatting locale might be Japanese, and the locale of the PersonName object might be de_CH, German (Switzerland), such as Albert Einstein. When the locale is foreign, the **foreignSpaceReplacement** is substituted for each space in the formatted name. When the name locale is native, a **nativeSpaceReplacement** is substituted for each space in the formatted name. The precise algorithm is given below. - 1. **The name is native** In that case, the patterns are used as is. - 2. **The name is foreign** In that case, the **foreignSpaceReplacement** is substituted for each space formatted name. Here are examples for Albert Einstein in Japanese and Chinese: +Here are examples for Albert Einstein in Japanese and Chinese: * [アルベルトアインシュタイン](https://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%AB%E3%83%99%E3%83%AB%E3%83%88%E3%83%BB%E3%82%A2%E3%82%A4%E3%83%B3%E3%82%B7%E3%83%A5%E3%82%BF%E3%82%A4%E3%83%B3) * [阿尔伯特·爱因斯坦](https://zh.wikipedia.org/wiki/%E9%98%BF%E5%B0%94%E4%BC%AF%E7%89%B9%C2%B7%E7%88%B1%E5%9B%A0%E6%96%AF%E5%9D%A6) +#### 6.7.2 Setting the spaceReplacement + +1. The foreignSpaceReplacement is provided by the value for the `foreignSpaceReplacement` element; the default value is " ". +2. The nativeSpaceReplacement is determined by the following algorithm, chosing between " " and "". + 1. Get the script of the formatting locale + 2. If the likely script is Thai, let nativeSpaceReplacement = " " (space) + 3. Otherwise let nativeSpaceReplacement = "" (empty string) if either of the following applies: + 1. The script is Jpan, Hant, or Hans + 2. The script has the script metadata property lbLetters = YES (this can also be algorithmically derived from the LineBreak property data). + 4. Otherwise, let nativeSpaceReplacement = " " (space) +3. If the formatter base language matches the name base language, then let spaceReplacement = nativeSpaceReplacement, otherwise let spaceReplacement = foreignSpaceReplacement. +4. Replace all sequences of space in the resolved pattern string by the spaceReplacement. -In both cases, the ordering may be changed according to the **Name Order for Locales** settings that each locale provides. If the PersonName object does not supply a locale for a name, then a default locale will be derived based on other information (such as the script of the characters in the name fields). +For the purposes of this algorithm, two base languages are said to __match__ when they identical, or if both are in {ja, zh, yue}. -Remember that **a name in a different script** will use a different locale for formatting. +Remember that **a name in a different script** will use a different locale for formatting, as per Switch the formatting locale if necessary. For example, when formatting a name for Japanese, if the name is in the Latin script, a Latin based locale will be used to format it, such as when “Albert Einstein” appears in Latin characters on [Albert Einstein](https://ja.wikipedia.org/wiki/Albert_Einstein) +#### 6.7.3 Examples of space replacement To illustrate how foreign space replacement works, consider the following name data. For illustration, the name locale is given in the maximized form: in practice, `ja` would be used instead of `ja_Jpan_JP`, and so on.: For more information, see [Likely Subtags](tr35.html#Likely_Subtags). +*TBD Review all these examples to see if they need updating* + | name locale | given | surname | | ------------- | -------- | ------------- | | `de_Latn_CH` | Albert | Einstein | -| `de_Jpan_CH` | アルベルト | アインシュタイン | -| `ja_Jpan_JP` | Hayao | Miyazaki | +| `de_Kata_CH` | アルベルト | アインシュタイン | +| `ja_Kata_CH` | アルベルト | アインシュタイン | +| `ja_Latn_JP` | Hayao | Miyazaki | | `ja_Jpan_JP` | 駿 | 宮崎 | Suppose the PersonNames formatting patterns for `ja_JP` and `de_CH` contained the following: @@ -843,6 +764,7 @@ Note in the `de_CH` locale, _ja_ is not listed in nameOrderLocales, and would th The name data would resolve as follows: + @@ -948,6 +870,127 @@ The name data would resolve as follows:

+### 6.8 Formatting examples + +*TBD Review all these examples to see if they need updating* + +The personName element contains: + + +> `{title} {given} {given2} {surname}, {credentials}` + + +The input PersonName object contains: + +| `title` | `given` | `given2` | `surname` | `generation` | +| -------- | ------- | -------- | --------- | -------- | +| | Raymond | J. | Johnson | Jr. | + +The output is: + +> Raymond J. Johnson, Jr. + +The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above. + +If, instead, the input PersonName object contains: + +| `title` | `given` | `given2` | `surname` | `generation` | +| -------- | ------- | -------- | --------- | -------- | +| | Raymond | J. | Johnson | | + +The output is: + +> Raymond J. Johnson + +The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above. + +The “generation” field is also empty, so it and both the comma and the space that precede it are omitted from the output, according to rule 2 above. + +To see how rule 3 interacts with the other rules, consider an imaginary language in which people generally have given and given2 (or middle) names, and the given2 name is always written with parentheses around it, and the given name is usually written as an initial with a following period. + +The personName element contains: + +> `{given-initial}. ({given2}) {surname}` + + +The input PersonName object contains: + +| `given` | `given2` | `surname` | +| ------- | -------- | --------- | +| Foo | Bar | Baz | + +The output is: + +> F. (Bar) Baz + +If, instead, the input PersonName object contains: + +| `given` | `given2` | `surname` | +| ------- | -------- | --------- | +| Foo | | Baz | + +The output is: + +> F. Baz + +The “given2” field is empty, so it and the surrounding parentheses are omitted from the output, as is one of the surrounding spaces, according to rule 3. The period after “{given-initial}” remains, because it is separated from the “{given2}” element by space-- punctuation around a missing field is only deleted up until the closest space in each direction. + +If there were no space between the period and the parentheses, as might happen if our hypothetical language didn’t use spaces: + +> `{given-initial}.({given2}) {surname}` + +The input PersonName object still contains: + +| `given` | `given2` | `surname` | +| ------- | -------- | --------- | +| Foo | | Baz | + +The output is: + +> F Baz + +Both the period after “{given-initial}” _and_ the parentheses around “{given2}” are omitted from the output, because there was no space between them — instead, we delete punctuation all the way up to the neighboring field. To solve this (making sure the “{given-initial}” field always has a period after it), you would add another namePattern: + +> `{given-initial}.({given2}) {surname}`
+> `{given-initial}. {surname}` + +The first pattern would be used when the “given2” field is populated, and the second pattern would be used when the “given2” field is empty. + +Rules 1 and 3 can conflict in similar ways. If the personName element contains (there’s a space between the period and the opening parenthesis again): + +> `{given-initial}. ({given2}) {surname}` + +And the input PersonName object contains: + +| `given` | `given2` | `surname` | +| ------- | -------- | --------- | +| | Bar | Baz | + +The output is: + +> Bar) Baz + +Because the “given” field is empty, rule 1 not only has us delete it, but also all punctuation up to “{given2}”. This includes _both_ the period _and_ the opening parenthesis. Again, to solve this, you’d supply two namePatterns: + +> `{given-initial}. ({given2}) {surname}`
+> ` ({given2}) {surname}` + +The output would then be: + +> (Bar) Baz + +The first namePattern would be used if the “given” field was populated, and the second would be used if it was empty. + +If, instead, the input PersonName object contains: + +| `given` | `given2` | `surname` | +| ------- | -------- | --------- | +| Foo | | Baz | + +The output is: + +> F. Baz + ## 7 Sample Name The sampleName element is used for test names in the personNames LDML data for each locale to aid in testing and display in the CLDR Survey Tool. They are not intended to be used in production software as prompts or placeholders in a user interface and should not be displayed in a user interface.