Skip to content

x/text/number: provide way to query number system #53872

Open
@golightlyb

Description

@golightlyb

(edited to reduce scope)

/x/text/language lets you query Region and Script:

func (t Tag) Region() (Region, Confidence)
func (t Tag) Script() (Script, Confidence)

There should also be a way to query the Number System that Go has matched for the locale. Currently, this can be done with the Extensions method on a language.Tag but only if the number system has been specified explicitly in the locale string. Otherwise, there is no easy way to know what the default Number System chosen is.

This is probably best implemented by golang.org/x/text/number exporting a function called "SystemFromTag". This should return a number.System, which should support the stringer interface.

This is already done in /x/text/internal/number (InfoFromTag), and would be a trivial wrapper.

In future, a number.System could export other useful information that the internal number.InfoFromTag exposes, but this is not necessary for now.

Use case

This would let client code match the number system selected for a locale by /x/text, and if it needs any more information, look it up in the Unicode CLDR data files, which is a simple file at supplemental/numberingsystems.xml.

Without this, there's the much more involved process of reimplementing parsing the locale string, and re-implementing the mapping of a locale to its default number system, including the hierarchy of parent locales.

Example Usage

package main

import (
    "fmt"

    "golang.org/x/text/language"
    "golang.org/x/text/message"
    "golang.org/x/text/number"
)

func main() {
    ts := []language.Tag{
        language.MustParse("en-GB"),
        language.MustParse("en-GB-u-nu-fullwide"),
        language.MustParse("ar"),
        language.MustParse("ar-u-nu-latn"),
        language.MustParse("ta"),
        language.MustParse("ta-u-nu-taml"),
        language.MustParse("ta-u-nu-tamldec"),
    }

    for _, t := range ts {
        fmt.Printf("%s\n", t.String())

        r, _ := t.Region()
        fmt.Printf("%s, %s\n", r.String(), r.ISO3())

        s, _ := t.Script()
        fmt.Printf("%s\n", s.String())

        message.NewPrinter(t).Println(number.Decimal(123456789))

        // PROPOSED:
        // n, _ := number.SystemFromTag(t)
        // fmt.Printf("%s\n", n.String())

        fmt.Println("---")
    }


    // Expected Outputs:
    // en-GB
    // GB, GBR
    // Latn
    // 123,456,789
    // latn
    // ---
    // en-GB-u-nu-fullwide
    // GB, GBR
    // Latn
    // 123,456,789
    // fullwide
    // ---
    // ar
    // EG, EGY
    // Arab
    // ١٢٣٬٤٥٦٬٧٨٩
    // arab
    // ---
    // ar-u-nu-latn
    // EG, EGY
    // Arab
    // 123,456,789
    // latn
    // ---
    // ta
    // IN, IND
    // Taml
    // 12,34,56,789
    // latn
    // ---
    // ta
    // IN, IND
    // Taml
    // 12,34,56,789 // taml is not a decimal format, so ignore this line
    // taml
    // ---
    // ta-u-nu-tamldec
    // IN, IND
    // Taml
    // ௧௨,௩௪,௫௬,௭௮௯
    // tamldec
    // ---
}

Example implementation

/x/text/number should change as follows:

// System holds information about a numbering system
type System struct {
    info number.Info // from /x/text/internal/number
}

// SystemFromTag returns a Numbering System for the given language tag.  If it
// was not explicitly given (e.g. "en-u-nu-mathbold"), it will infer a most
// likely candidate. This is subject to change.
func SystemFromTag(t language.Tag) System, Confidence {
    // TODO select a Confidence
    return number.info.InfoFromTag(t), confidence
}

// String returns the BCP 47 U Extension representation for the Number System Identifier.
func (s System) String() string {
    ....
}

Open questions

  1. The documentation for Region.String says it returns "ZZ" for an unspecified region. Script.String returns "Zzzz" for an unspecified script. Would SystemFromTag ever fail to return a numbering system? Could it in the future? If so, what should that system's string representation be? Probably just the default, with an appropriate "No" confidence value?

  2. /x/text/number currently doesn't support number system categories at all - e.g. "tamil-u-nu-native", "tamil-u-nu-traditio" or "zh-u-nu-finance" - only explicit matches e.g. "tamil-u-nu-tamldec". Should this be implemented first? It would probably impact the returned Confidence value. (See x/text/number: understands specific BCP-47 u-nu-extensions, but not general categories #54090)

References

  1. https://cldr.unicode.org/translation/core-data/numbering-systems
  2. https://www.unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions