Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Add calendar and numberingSystem options #175

Merged
merged 7 commits into from
Feb 27, 2020

Conversation

littledan
Copy link
Member

This patch allows calendar and numberingSystem to be specified in
the options bag of the DateTimeFormat and NumberFormat constructors.
One use case for these options would be, when working with locales
which have two numbering systems and calendars in use--the UA default
may be the non-Western one, but in some contexts, it may be appropriate
to use the Western one. Currently, without the patch, it would be
necessary to parse the BCP 47 language tag in the application, but Intl
provides no library to do so. For example, this all occurs in this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1370086

Related bug: #105 . This patch leaves out "collation" because of a lack
of clear use cases.

littledan added a commit to littledan/test262 that referenced this pull request Sep 13, 2017
This patch implements tests for the ECMA 402 PR at
tc39/ecma402#175

It is based on the test test/intl402/Collator/10.1.1_19_c.js
@littledan
Copy link
Member Author

Test262 tests: tc39/test262#1225

V8 implementation: https://chromium-review.googlesource.com/c/v8/v8/+/664762

Copy link
Contributor

@caridy caridy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be very straight forward to add these new options.

@littledan
Copy link
Member Author

Any other opinions here? Can this be merged?

@littledan
Copy link
Member Author

In particular, do you think we should go to TC39 with this change, or is it minor enough to merge it without that presentation?

@caridy
Copy link
Contributor

caridy commented Oct 12, 2017

I think we can do this without any presentation, but let's loop in few more folks to get their feedback:
@zbraniecki @jungshik

@zbraniecki
Copy link
Member

I agree with @caridy

@caridy
Copy link
Contributor

caridy commented Oct 24, 2017

@littledan do you want to merge this?

1. If _calendar_ is not *undefined*, set _calendar_ to ? ToString(_calendar_).
1. Set _opt_.[[ca]] to _calendar_.
1. Let _numberingSystem_ be ? Get(_options_, *"numberingSystem"*).
1. If _numberingSystem_ is not *undefined*, set _numberingSystem_ to ? ToString(_numeringSystem_).
Copy link
Contributor

@anba anba Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: numeringSystem (and in the line below and in the equivalent lines for Intl.NumberFormat.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@littledan can you update the PR to fix these typos?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the typo, apologies for the delay.

@caridy caridy added the s: discuss Status: TG2 must discuss to move forward label Nov 16, 2017
@littledan
Copy link
Member Author

This PR reached consensus at the November 2017 TC39 meeting, see notes.

@anba
Copy link
Contributor

anba commented Jan 18, 2018

Nit: Instead of

  1. Let calendar be ? Get(options, "calendar").
  2. If calendar is not undefined, set calendar to ? ToString(calendar).
  3. Set opt.[[ca]] to calendar.

The GetOption operation should be used for consistency with access to other options.

  1. Let calendar be ? GetOption(options, "calendar", "string", undefined, undefined).
  2. Set opt.[[ca]] to calendar.

(And we should change Get(options, "timeZone") in InitializeDateTimeFormat to use GetOption, too.)

But probably more important than how to access and convert the new options: Currently all options-bag options seem to be validated in some way and invalid option values are rejected with an error, whereas invalid options from Unicode extensions are simply ignored (e.g. Intl.Collator("de", {caseFirst: "invalid"}) throws a RangeError, whereas Intl.Collator("de-u-kf-invalid") is silently accepted). This change introduces the first options-bag options which don't get any validation at all, which seems a bit unfortunate. And furthermore we should also formally define the allowed values for the "numberingSystem" and "calendar" options (BCP 47 vs. UTS 35 names).

For example let's consider the "calendar" option:
https://tc39.github.io/ecma402/#sec-properties-of-intl-datetimeformat-instances describes [[Calendar]] as:

[[Calendar]] is a String value with the "type" given in Unicode Technical Standard 35 for the calendar used for formatting.

But which "type" is meant here? The value of the "name" attribute of the <type> element for BCP 47 (https://unicode.org/repos/cldr/tags/latest/common/bcp47/calendar.xml) or the value of "type" attribute of the <calendar> element (https://unicode.org/repos/cldr/tags/latest/common/main/root.xml)? And is it then Intl.DateTimeFormat("de", {calendar: "gregory"}) or Intl.DateTimeFormat("de", {calendar: "gregorian"}) or maybe even both? Also what about canonicalizing the input, for example if the input is "GrEgOrIaN", does it get accepted and if it is accepted, does it get canonicalized to "gregorian"? And if we only accept known calendars per UTS 35, we probably also need to explicitly disallow the name "generic", don't we?

For example Intl.DateTimeFormat("de-u-ca-gregory").resolvedOptions().calendar prints "gregory" for me with V8 and SpiderMonkey, but "gregorian" with JSC (on Ubuntu16.04, IIRC JSC uses the system ICU which is ICU55 for Ubuntu16.04; V8 and SpiderMonkey both use ICU60). Right now I'd say only JSC is correct here and "gregorian" is the expected output. But WDYT?

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

@littledan
Copy link
Member Author

Unfortunately I don't think ICU has an API for validating these sorts of things. I was thinking of the validation logic a bit like, things which were based on an open-ended name would be checked through resolvedOptions.

Do you think it will be important for users to get a RangeError here? Does anyone remember the original intention with the current design for error checking? cc @NorbertLindenberg

We reviewed this patch at the ECMA 402 meeting. At that time, I'd only skimmed @anba's comment and didn't realize the validation issue. I guess this patch isn't ready to merge until we make a decision on that question.

@zbraniecki
Copy link
Member

As discussed on the ECMA402 call - we should only validate the options to be well-formed, just like we do for extension key values.

littledan added a commit to littledan/ecma402 that referenced this pull request Feb 27, 2018
This patch allows calendar and numberingSystem to be specified in
the options bag of the DateTimeFormat and NumberFormat constructors.
One use case for these options would be, when working with locales
which have two numbering systems and calendars in use--the UA default
may be the non-Western one, but in some contexts, it may be appropriate
to use the Western one. Currently, without the patch, it would be
necessary to parse the BCP 47 language tag in the application, but Intl
provides no library to do so. For example, this all occurs in this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1370086

This patch validates the calendar and numbering system by comparing
them to the grammar allowed for Unicode extension tags, per the resolution
documented at tc39#175 (comment)

Related bug: tc39#105 . This patch leaves out "collation" because of a lack
of clear use cases.
@littledan
Copy link
Member Author

OK, the newly uploaded patch is based on validating the grammar, but not contents, of the calendar and numberFormat options. It does not address @anba's other canonicalization question, however. How does this seem?

littledan added a commit to littledan/ecma402 that referenced this pull request Feb 27, 2018
This patch allows calendar and numberingSystem to be specified in
the options bag of the DateTimeFormat and NumberFormat constructors.
One use case for these options would be, when working with locales
which have two numbering systems and calendars in use--the UA default
may be the non-Western one, but in some contexts, it may be appropriate
to use the Western one. Currently, without the patch, it would be
necessary to parse the BCP 47 language tag in the application, but Intl
provides no library to do so. For example, this all occurs in this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1370086

This patch validates the calendar and numbering system by comparing
them to the grammar allowed for Unicode extension tags, per the resolution
documented at tc39#175 (comment)

Related bug: tc39#105 . This patch leaves out "collation" because of a lack
of clear use cases.
@caridy
Copy link
Contributor

caridy commented Feb 27, 2018

hmm, at first glance, "match type" feels weird to me considering that you will have to open the link, and try to get the glimpse of what type is. This seems to be simple enough that maybe we can specify it here.

This patch allows calendar and numberingSystem to be specified in
the options bag of the DateTimeFormat and NumberFormat constructors.
One use case for these options would be, when working with locales
which have two numbering systems and calendars in use--the UA default
may be the non-Western one, but in some contexts, it may be appropriate
to use the Western one. Currently, without the patch, it would be
necessary to parse the BCP 47 language tag in the application, but Intl
provides no library to do so. For example, this all occurs in this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1370086

This patch validates the calendar and numbering system by comparing
them to the grammar allowed for Unicode extension tags, per the resolution
documented at tc39#175 (comment)

Related bug: tc39#105 . This patch leaves out "collation" because of a lack
of clear use cases.
- Convert calendar and numbering system to lowercase, analogously
  with canonicalizing language tags
- Reference the Unicode grammar, to avoid the need to keep up to
  date with notational changes from upstream
@littledan
Copy link
Member Author

I've rebased and updated the PR to follow @anba's suggestions above. The lowercase conversion is a really necessary bug fix IMO. Probably we need to add some additional test262 tests to verify.

@bkardell
Copy link

bkardell commented Jan 16, 2020

Ok, I am having some trouble understanding... Looking at the spec, the test and using the implementations I have a question. Here's a matrix of various inputs and qualities..

constructor args parts has year parts has yearName parts has relatedYear
"en" yes no no
"en", {year: "numeric"} yes no no
"en-u-ca-chinese" no no yes
"en-u-ca-chinese", {year: "numeric"} no yes yes

as a developer I find it hard to understand the logic behind these results.. why wouldn't you have a .year in some cases? It seems very strange to pass year numeric in the last case and have a relatedYear, but not a .year? I'm sure there's probably very good reasons - I'm just wondering if someone can help explain it to me so that I can document it?

@FrankYFTang
Copy link
Contributor

Ok, I am having some trouble understanding... Looking at the spec, the test and using the implementations I have a question. Here's a matrix of various inputs and qualities..

constructor args parts has year parts has yearName parts has relatedYear
"en" yes no no
"en", {year: "numeric"} yes no no
"en-u-ca-chinese" no no yes
"en-u-ca-chinese", {year: "numeric"} no yes yes
as a developer I find it hard to understand the logic behind these results.. why wouldn't you have a .year in some cases? It seems very strange to pass year numeric in the last case and have a relatedYear, but not a .year? I'm sure there's probably very good reasons - I'm just wondering if someone can help explain it to me so that I can document it?

The problem for the last part is the caller asked for a "numeric" for year, but in en-u-ca-chinese calendar, the year is not a numeric value (not 1 , 2, 3, 4, in any numbering system), but a string name
"jia-zi",
"yi-chou",
"bing-yin",
"ding-mao",
"wu-chen",
"ji-si",
"geng-wu",
"xin-wei",
"ren-shen",
"gui-you",
"jia-xu",
"yi-hai",
"bing-zi",
"ding-chou",
"wu-yin",
"ji-mao",
....
a list of 60 names of year.

@bkardell
Copy link

bkardell commented Jan 17, 2020

@FrankYFTang thanks... sorry, I still don't entirely understand. If there is a better venue to ask you these kinds of questions, let me know.. In the meantime, here are some more clarifying quesitons - sorry if they seem basic or something.. Does this mean that:

  1. "en-u-ca-chinese" will never have an entry called year - no matter what you put in year option?

  2. "en-u-ca-chinese" will always have an entry called yearName``, which is the year, but a string-oriented name and therefore doesn't just belong in year`?

  3. "en-u-ca-chinese" will always have a property called relatedYear which will contain.... what is normally in the year except is never affected by what you put in the opt for year?

@FrankYFTang
Copy link
Contributor

@FrankYFTang thanks... sorry, I still don't entirely understand. If there is a better venue to ask you these kinds of questions, let me know.. In the meantime, here are some more clarifying quesitons - sorry if they seem basic or something.. Does this mean that:

  1. "en-u-ca-chinese" will never have an entry called year - no matter what you put in year option?

As the current spec, YES

  1. "en-u-ca-chinese" will always have an entry called yearName, which is the year, but a string-oriented name and therefore doesn't just belong in ``year`?

YES, because the form is neither "numeric" nor "2-digit"

  1. "en-u-ca-chinese" will always have a property called relatedYear which will contain.... what is normally in the year except is never affected by what you put in the opt for year?

First, it may not ALWAYS output 'relatedYear'. It MAY output it, depending on the values in the options.
Second, a better way to say it is , if the formatter output 'relatedYear', the 'relatedYear' will contain what is the value in the year OF THE Gregorian calendar, not the Chinese calendar. That is why it is called "related"- a year value NOT in that request calendar, but in a different calendar system - the Gregorian calendar. The reason in the modern day, even when user want to see the date output in Chinese calendar (or some other non Gregorian calendar), they ALSO want to see the year of Gregorian calendar next to it. That is why it is also output in the POSITION where the year of that calendar is output.

@bkardell
Copy link

Thanks @FrankYFTang that's very helpful. I still find it confusing as someone not yet deeply into this problem, but at least I kind of understand how it winds up here now!

@littledan
Copy link
Member Author

@bkardell and I had talked about this before filing the issue, and the case that I found surprising was where we actually had a relatedYear but no yearName, which you can see in Brian's data above, e.g.,

var df = new Intl.DateTimeFormat("en-u-ca-chinese")
var parts = df.formatToParts(new Date())
console.log(JSON.stringify(parts))

[{"type":"month","value":"12"},{"type":"literal","value":"/"},{"type":"day","value":"21"},{"type":"literal","value":"/"},{"type":"relatedYear","value":"2019"}]

This works the same in V8 and SpiderMonkey.

@FrankYFTang wrote,

"en-u-ca-chinese" will never have an entry called year - no matter what you put in year option?

As the current spec, YES

This is the part where I didn't quite understand the current spec. I guess this is because in ToLocalTime, the calendar calculation is independent of the format being output? So, even though we're just using a month/day/year format and there's no yearName, we still use relatedYear. Is that right? The motivation here is that 2019 is not the year in the Chinese calendar? If that's the case, then I'm a bit confused about why this display format exists in the locale database...

@FrankYFTang
Copy link
Contributor

@bkardell and I had talked about this before filing the issue, and the case that I found surprising was where we actually had a relatedYear but no yearName, which you can see in Brian's data above, e.g.,

var df = new Intl.DateTimeFormat("en-u-ca-chinese")
var parts = df.formatToParts(new Date())
console.log(JSON.stringify(parts))

[{"type":"month","value":"12"},{"type":"literal","value":"/"},{"type":"day","value":"21"},{"type":"literal","value":"/"},{"type":"relatedYear","value":"2019"}]

This works the same in V8 and SpiderMonkey.

@FrankYFTang wrote,

"en-u-ca-chinese" will never have an entry called year - no matter what you put in year option?

As the current spec, YES

This is the part where I didn't quite understand the current spec. I guess this is because in ToLocalTime, the calendar calculation is independent of the format being output? So, even though we're just using a month/day/year format and there's no yearName, we still use relatedYear. Is that right? The motivation here is that 2019 is not the year in the Chinese calendar? If that's the case, then I'm a bit confused about why this display format exists in the locale database...

The skeleton it asked inside ICU is "yMd" for this case, and the pattern is
yMd{"M/d/r"}
notice, the 'r' which is the related year
see
https://cs.chromium.org/chromium/src/third_party/icu/source/data/locales/en.txt?q=locales/en.txt&sq=package:chromium&dr&l=269

Feel free to file a CLDR ticket ask CLDR linguist to clarify the rationale.
In Taiwan, for example, that is now how they print on the news paper. For example
https://chenminjen07.files.wordpress.com/2015/04/b329a-e59c96e789871.jpg
The top line (above the big bold title of the headline, left next to vertical red banner of the 3 characters title of the newspaper) is date in both 'roc' calendar and 'chinese' calendar

@littledan
Copy link
Member Author

Thanks for explaining, @FrankYFTang

Adding case to map values properly when mapping to lower case
@leobalter
Copy link
Member

My understanding is that we have tests for this. cc @caiolima, is this correct?

@leobalter leobalter merged commit 7ffc4dd into tc39:master Feb 27, 2020
@anba
Copy link
Contributor

anba commented Feb 27, 2020

This PR landed after Intl.Locale, but didn't fix the issue outlined in tc39/proposal-intl-locale#96. Maybe revert the PR for now?

@Ms2ger Ms2ger deleted the more-options branch February 27, 2020 11:05
leobalter added a commit to leobalter/ecma402 that referenced this pull request Mar 2, 2020
This is a temporary revert commit for tc39#175 until a canonicalization for Unicode extensions are
added through ResolveLocale.

Ref tc39/proposal-intl-locale#96

This commit should be reverted along the proposed solution. Changes are simplified in this single commit to avoid reverting all the 7 original commits from tc39#175.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: datetime Component: dates, times, timezones c: numbers Component: numbers, currency, units enhancement has consensus Has consensus from TC39-TG2 s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.