Description
While working on #2493, I learned that the IANA Time Zone Database has been forked due to a disagreement between that database's maintainer and some prominent users of the database.
Background
The two forks differ as follows:
- "Primary" fork - Many time zones that have had the same rules since 1970 have been merged into one canonical identifier, with the old identifiers remaining as links. Examples include:
Europe/Copenhagen
=>Europe/Berlin
andAtlantic/Reykyavik
=>African/Abidjan
. There are many more examples like this. This fork is preferred by the TZDB maintainer, and therefore is exposed by the official IANA downloads of TZDB releases. - "Unmerged fork - The merges described above are reverted. This fork is preferred by reps from Java, NetBSD, and probably others too. It's available via downloads from the fork repo (https://github.com/JodaOrg/global-tz), or by building TZDB from source using the new
PACKRATLIST
build option. That build option was added by the maintainer to ensure that both forks could be built out of the same repo. See discussion here and here.
You can read more about the fork in the TZDB mailing list archives. A few relevant threads:
- https://mm.icann.org/pipermail/tz/2022-July/031631.html
- https://mm.icann.org/pipermail/tz/2022-August/031752.html
The fork seems to represent a philosophical difference about the purpose of the TZDB. One camp (which includes the maintainer) sees the goal of TZDB as simply providing a way to convert post-1970 zoned timestamps into exact instants, and wants to reduce the TZDB size and maintenance hassle of dealing with pre-1970 data. The other camp (supporting the unmerged fork) adds additional use cases:
- a) Resolving pre-1970 zoned timestamps to instants, even if those pre-1970 data are known to be less reliable and more subject to revision.
- b) Providing metadata that may be useful in the future in case countries change their time zones or DST rules, even if no such changes have happened since 1970. The unmerged fork guarantees at least one canonical zone per ISO 3166-1 country code, which is sensible because time zone and DST changes typically happen at the country-code level except for the largest countries.
- c) Reducing "canonicalization confusion" where users set one zone and end up with a zone that seems completely different, and maybe even in a different continent like Iceland => Cote d'Ivoire. This seems particularly sensitive in the case of Denmark, Sweden, and other European countries being canonicalized to Germany, which for obvious reasons may trigger historical sensitivity.
I'm not sure how much Temporal cares about pre-1970 dates, but the latter two issues seem quite important to Temporal users. The second one will make calendaring apps more resilient to country-level timezone/DST changes, while the third will prevent developer confusion and consternation.
Also, given the complaints about the changes, it's possible that the TZDB may revert these changes in the future, which would cause further churn.
Options
Anyway, now that we know this fork exists, we need to figure out what to do about it in the Temporal spec. Options include:
1. Recommend that implementers use the Primary Fork
- Pro: this is the status quo, so probably easiest to do
- Con: breaks (b) and (c) above; risks more confusion if changes are reverted later
2. Recommend that implementers use the Unmerged Fork
- Pro: Better backwards compatibility with existing timestamps; less geopolitical confusion going fwd
- Con: larger TZDB data because it includes pre-1970 rules for many more zones; requires changing implementer build processes; may cause bug reports because JS output will vary from other sources (e.g. Wikipedia) who are using the main fork.
3. Don't recommend anything; implementers are free to choose.
- Pro: allows implementer flexibility
- Con: code will work differently across implementations, which already causes problems even before these controversial merges. See IANA timezone db reference in the spec : should backzone be taken into account? ecma402#272 (comment), https://bugs.chromium.org/p/chromium/issues/detail?id=580195, etc.
4. Stop canonicalizing time zones (thanks to @pipobscure for this suggestion)
- Pro: supports (b) and (c) above without needing to pick a fork; makes ISO strings round-trippable even if canonicalization has changed since the string was stored; avoids test breakages and other results caused by canonicalization changes; solves "wrong canonical spelling" bugs like this chromium bug; ensures that code works more similarly across implementations and across time; more consistent equality comparisons with other Temporal types that use an
equals
method; avoids triggering geopolitical sensitivities caused by modifying user input point to an unexpected country or name. - Con: Probably requires adding a
Temporal.TimeZone.equals
method to help users identify equivalent time zones like Asia/Calcutta vs. Asia/Kolkata; may require modifying existing ICU behavior (per this comment, it sounds like Firefox already does similar mods).
Discussion
Of the above options, my strong preference is for (4), because it solves both the forking issue as well as the existing canonicalization issues like Calcutta vs. Kolkata. Also, I think retaining user input as-is will be quite helpful to reduce confusion in cases where code takes input from some other source, modifies that data, and then sends or stores the modified data. If the time zone identifier varies a lot between the original and modified ZDT, I think that will generate user confusion that avoiding canonicalization would prevent.
If we want to go with (4), here's a few questions to answer:
- i) How should
Intl.DateTimeFormat.p.resolvedOptions().timeZone
behave? Should it also stop canonicalizing? If yes, should it add a newcanonicalTimeZone
property? - ii) Will there be any change to user-visible output of
Intl.DateTimeFormat.p.format
orDate.p.toLocaleString
? I suspect that the answer is "no" because localized descriptions of time zones don't usually surface the IANA identifiers, but not 100% sure about this. - iii) What changes (if any) would be required to CLDR and/or ICU to support this change?
- iv) Even if we avoid the canonicalization mess, there's still the pre-1970-data question. The unmerged fork will have it, the merged fork will only have it for the merged zones. This would mean, for example, that Europe/Copenhagen pre-1970 results could vary by fork. So which fork should we recommend that implementers use? I don't have a strong opinion here. It'd be nice to understand the size of pre-1970 data to know how much smaller browser downloads would get if this data were removed.
- v) Should case differences still be canonicalized, e.g.
Europe/Paris
vs.europe/paris
? My opinion: yes, we should canonicalize. - vi) Should spelling differences due to renaming also be canonicalized, e.g.
Asia/Calcutta
vs.Asia/Kolkata
. My opinion: no, because by not canonicalizingid
in this case we can avoid user complaints like this chromium bug, and we can ensure future compatibility & round-trippability even if zones are renamed in the future. Note thatequals
should probably report these astrue
though. (See below.) - vii) Should we add a
TimeZone.p.equals
method? I think we should, both for consistency across Temporal types and to help code be robust in the face of past or future renames of cities which seems to happen fairly often globally. JS code should be able to ask "Is this date in the India time zone" without having to worry that that code will be broken by a past or future rename. - viii) If we add
equals
should we also add a method that tests if all rules are the same across time zones, e.g.Atlantic/Reykyavik
vs.Africa/Abidjan
? I don't think this is needed. Userland code can always usegetNextTransition
in a loop to check for this kind of equality, and if there's user demand we could always add it in a later release. - ix) How should UTC zone be handled? I think this is straightforward: all zones whose canonical identifier is
Etc/UTC
should resolve toUTC
in ECMAScript, matching current behavior. There's no value in changing this existing behavior. - x) In order for the
PACKRATLIST
option to work, TZDB data must provide a way to differentiate "merged" links likeAtlantic/Reykyavik
=>Africa/Abidjan
from "renamed" links likeAsia/Calcutta
vs.Asia/Kolkata
. How does this differentiation work, and does is work for all links or are there gaps? It sounds like @anba may know how this works.
If we add equals
, here's a suggestion for its behavior:
- It should accept objects or strings.
- If the receiver and/or the argument is a custom zone, use its
id
property. - Treat different casings as equal, e.g.
Europe/Paris
vs.europe/paris
. - Treat different spellings of the same location as equal, e.g.
Asia/Calcutta
vs.Asia/Kolkata
, because they represent the same thing with different spelling. - If both receiver and argument canonicalize to
Etc/UTC
then treat them as equal. - DO NOT treat different locations (like
Atlantic/Reykyavik
vs.Africa/Abidjan
) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?
Pinging @jasonwilliams @ptomato @sffc @gibson042 @pipobscure for your opinions.