Skip to content

Reconsider time zone canonicalization behavior given forking of IANA Time Zone Database #2509

Closed

Description

While working on #2493, I learned that the IANA Time Zone Database has been forked due to a disagreement between that database's maintainer and some prominent users of the database.

Background

The two forks differ as follows:

  • "Primary" fork - Many time zones that have had the same rules since 1970 have been merged into one canonical identifier, with the old identifiers remaining as links. Examples include: Europe/Copenhagen => Europe/Berlin and Atlantic/Reykyavik => African/Abidjan. There are many more examples like this. This fork is preferred by the TZDB maintainer, and therefore is exposed by the official IANA downloads of TZDB releases.
  • "Unmerged fork - The merges described above are reverted. This fork is preferred by reps from Java, NetBSD, and probably others too. It's available via downloads from the fork repo (https://github.com/JodaOrg/global-tz), or by building TZDB from source using the new PACKRATLIST build option. That build option was added by the maintainer to ensure that both forks could be built out of the same repo. See discussion here and here.

You can read more about the fork in the TZDB mailing list archives. A few relevant threads:

The fork seems to represent a philosophical difference about the purpose of the TZDB. One camp (which includes the maintainer) sees the goal of TZDB as simply providing a way to convert post-1970 zoned timestamps into exact instants, and wants to reduce the TZDB size and maintenance hassle of dealing with pre-1970 data. The other camp (supporting the unmerged fork) adds additional use cases:

  • a) Resolving pre-1970 zoned timestamps to instants, even if those pre-1970 data are known to be less reliable and more subject to revision.
  • b) Providing metadata that may be useful in the future in case countries change their time zones or DST rules, even if no such changes have happened since 1970. The unmerged fork guarantees at least one canonical zone per ISO 3166-1 country code, which is sensible because time zone and DST changes typically happen at the country-code level except for the largest countries.
  • c) Reducing "canonicalization confusion" where users set one zone and end up with a zone that seems completely different, and maybe even in a different continent like Iceland => Cote d'Ivoire. This seems particularly sensitive in the case of Denmark, Sweden, and other European countries being canonicalized to Germany, which for obvious reasons may trigger historical sensitivity.

I'm not sure how much Temporal cares about pre-1970 dates, but the latter two issues seem quite important to Temporal users. The second one will make calendaring apps more resilient to country-level timezone/DST changes, while the third will prevent developer confusion and consternation.

Also, given the complaints about the changes, it's possible that the TZDB may revert these changes in the future, which would cause further churn.

Options

Anyway, now that we know this fork exists, we need to figure out what to do about it in the Temporal spec. Options include:

1. Recommend that implementers use the Primary Fork

  • Pro: this is the status quo, so probably easiest to do
  • Con: breaks (b) and (c) above; risks more confusion if changes are reverted later

2. Recommend that implementers use the Unmerged Fork

  • Pro: Better backwards compatibility with existing timestamps; less geopolitical confusion going fwd
  • Con: larger TZDB data because it includes pre-1970 rules for many more zones; requires changing implementer build processes; may cause bug reports because JS output will vary from other sources (e.g. Wikipedia) who are using the main fork.

3. Don't recommend anything; implementers are free to choose.

4. Stop canonicalizing time zones (thanks to @pipobscure for this suggestion)

  • Pro: supports (b) and (c) above without needing to pick a fork; makes ISO strings round-trippable even if canonicalization has changed since the string was stored; avoids test breakages and other results caused by canonicalization changes; solves "wrong canonical spelling" bugs like this chromium bug; ensures that code works more similarly across implementations and across time; more consistent equality comparisons with other Temporal types that use an equals method; avoids triggering geopolitical sensitivities caused by modifying user input point to an unexpected country or name.
  • Con: Probably requires adding a Temporal.TimeZone.equals method to help users identify equivalent time zones like Asia/Calcutta vs. Asia/Kolkata; may require modifying existing ICU behavior (per this comment, it sounds like Firefox already does similar mods).

Discussion

Of the above options, my strong preference is for (4), because it solves both the forking issue as well as the existing canonicalization issues like Calcutta vs. Kolkata. Also, I think retaining user input as-is will be quite helpful to reduce confusion in cases where code takes input from some other source, modifies that data, and then sends or stores the modified data. If the time zone identifier varies a lot between the original and modified ZDT, I think that will generate user confusion that avoiding canonicalization would prevent.

If we want to go with (4), here's a few questions to answer:

  • i) How should Intl.DateTimeFormat.p.resolvedOptions().timeZone behave? Should it also stop canonicalizing? If yes, should it add a new canonicalTimeZone property?
  • ii) Will there be any change to user-visible output of Intl.DateTimeFormat.p.format or Date.p.toLocaleString? I suspect that the answer is "no" because localized descriptions of time zones don't usually surface the IANA identifiers, but not 100% sure about this.
  • iii) What changes (if any) would be required to CLDR and/or ICU to support this change?
  • iv) Even if we avoid the canonicalization mess, there's still the pre-1970-data question. The unmerged fork will have it, the merged fork will only have it for the merged zones. This would mean, for example, that Europe/Copenhagen pre-1970 results could vary by fork. So which fork should we recommend that implementers use? I don't have a strong opinion here. It'd be nice to understand the size of pre-1970 data to know how much smaller browser downloads would get if this data were removed.
  • v) Should case differences still be canonicalized, e.g. Europe/Paris vs. europe/paris? My opinion: yes, we should canonicalize.
  • vi) Should spelling differences due to renaming also be canonicalized, e.g. Asia/Calcutta vs. Asia/Kolkata. My opinion: no, because by not canonicalizing id in this case we can avoid user complaints like this chromium bug, and we can ensure future compatibility & round-trippability even if zones are renamed in the future. Note that equals should probably report these as true though. (See below.)
  • vii) Should we add a TimeZone.p.equals method? I think we should, both for consistency across Temporal types and to help code be robust in the face of past or future renames of cities which seems to happen fairly often globally. JS code should be able to ask "Is this date in the India time zone" without having to worry that that code will be broken by a past or future rename.
  • viii) If we add equals should we also add a method that tests if all rules are the same across time zones, e.g. Atlantic/Reykyavik vs. Africa/Abidjan? I don't think this is needed. Userland code can always use getNextTransition in a loop to check for this kind of equality, and if there's user demand we could always add it in a later release.
  • ix) How should UTC zone be handled? I think this is straightforward: all zones whose canonical identifier is Etc/UTC should resolve to UTC in ECMAScript, matching current behavior. There's no value in changing this existing behavior.
  • x) In order for the PACKRATLIST option to work, TZDB data must provide a way to differentiate "merged" links like Atlantic/Reykyavik => Africa/Abidjan from "renamed" links like Asia/Calcutta vs. Asia/Kolkata. How does this differentiation work, and does is work for all links or are there gaps? It sounds like @anba may know how this works.

If we add equals, here's a suggestion for its behavior:

  • It should accept objects or strings.
  • If the receiver and/or the argument is a custom zone, use its id property.
  • Treat different casings as equal, e.g. Europe/Paris vs. europe/paris.
  • Treat different spellings of the same location as equal, e.g. Asia/Calcutta vs. Asia/Kolkata, because they represent the same thing with different spelling.
  • If both receiver and argument canonicalize to Etc/UTC then treat them as equal.
  • DO NOT treat different locations (like Atlantic/Reykyavik vs. Africa/Abidjan) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?

Pinging @jasonwilliams @ptomato @sffc @gibson042 @pipobscure for your opinions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions