Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Etc/Unknown (like ICU does) to handle the case when the OS's time zone is unrecognized by ECMAScript #25

Closed
wants to merge 1 commit into from

Conversation

justingrant
Copy link
Collaborator

@justingrant justingrant commented May 29, 2023

This PR specifies what to do in SystemTimeZoneIdentifier (DefaultTimeZone until tc39/ecma262#3035 renames it) when ECMAScript cannot determine or doesn't recognize the OS's time zone ID. The most common cause is when new IDs are added to the IANA Time Zone Database, and the OS is updated to use this new ID, but the ECMAScript implementation has an older version of TZDB that doesn't include this ID.

Examples include renames, like "Europe/Kiev" => "Europe/Kyiv", or brand-new Zones, like "America/Ciudad_Juarez" that was recently added.

Currently Chromium solves this problem by exposing a special ICU ID "Etc/Unknown" back to userland. (This ID is defined in UTS 35.) This Chromium behavior is problematic because users cannot use "Etc/Unknown" as input. For example, Temporal.TimeZone.from(Temporal.Now.timeZoneId()) would throw.

Non-Chromium implementations return "UTC" in this case. This behavior is also problematic because it masks the problem; programs can't detect that anything is broken because "UTC" can mean either "this computer is set to UTC" or "this computer's time zone is unknown". This makes it impossible for programs to notify users that action is needed to resolve the issue, e.g. by upgrading the ECMAScript implementation to a later version that includes the unrecognized time zone IDs.

Here are a few possible solutions. This seems like a problem with no good solutions, only bad and worse solutions. Which of below do you think would be least bad for users? Why?

  1. Return "UTC" from Intl.DTF / Temporal.Now (current non-Chromium behavior). Recommend that a warning message be logged to the console.
  2. Return "Etc/Unknown" from Intl.DTF / Temporal.Now, but throw if this ID is used as input to TimeZone / ZDT / Intl.DTF (current Chromium behavior)
  3. Return "Etc/Unknown" from Intl.DTF / Temporal.Now and allow it as input to TimeZone / ZDT / Intl.DTF (proposed in this PR)
  4. Have SystemTimeZoneIdentifier throw, and recommend a clear and google-able error message.

I'd like to discuss these solutions to figure out which one may be best. To help with discussion, this PR proposes (3) above: doing what ICU does when the OS's time zone is unrecognized, which is to return "Etc/Unknown" as the identifier, and also to accept this ID as input.

This PR makes the following spec changes:

  • A one-line change to SystemTimeZoneIdentifier (which lives only in 262) to assert that it returns "Etc/Unknown" if the host environment's current time zone is unrecognized by the ECMAScript implementation. This matches the current Chromium behavior, but not other implementations.
  • A new "Unknown Time Zone" section in 262 explaining what "Etc/Unknown" is and how it behaves, which is just like the UTC zone except it has a distinct (and easily Google-able) identifier and (in 402 implementations) distinct localized text. This ID and presumably the localized text too will be easily Google-able so that programmers and end users can diagnose the problem and find resolution steps.
  • Two lines of changes to 262's and 402's AvailableNamedTimeZoneIdentifiers (see Editorial: simplify and clarify time zone identifier spec text ecma262#3035) to ensure that "Etc/Unknown" is an available named time zone identifier and is primary. Unlike in any implementation today, ECMAScript would accept Etc/Unknown as input wherever time zones are accepted, e.g. the Temporal.TimeZone.from and that type's constructor, Temporal.ZonedDateTime.from and that type's constructor, the Intl.DateTimeFormat constructor, etc.

@justingrant justingrant changed the title Use the ID Etc/Unknown (like ICu does) to handle the case when the OS's time zone is unrecognized by ECMAScript Use the ID Etc/Unknown (like ICU does) to handle the case when the OS's time zone is unrecognized by ECMAScript May 29, 2023
@justingrant justingrant changed the title Use the ID Etc/Unknown (like ICU does) to handle the case when the OS's time zone is unrecognized by ECMAScript Use Etc/Unknown (like ICU does) to handle the case when the OS's time zone is unrecognized by ECMAScript May 29, 2023
@anba
Copy link
Contributor

anba commented May 29, 2023

I propose to specify doing what ICU (and hence V8 and AFAIK, others too) does currently, which is to return Etc/Unknown as the identifier in these cases.

SpiderMonkey doesn't ever return Etc/Unknown, instead UTC is returned for unrecognised time zones. And it looks like JSC also defaults to UTC, but I wasn't able to determine where this happens.

Evaluating

let dtf = new Intl.DateTimeFormat("en", {timeZoneName: "long"});
print(dtf.resolvedOptions().timeZone, ":", dtf.format());

with TZ=BAD returns for me (on Ubuntu):

Engine Output
SpiderMonkey / Firefox UTC : 5/29/2023, Coordinated Universal Time
JSC UTC : 5/29/2023, Coordinated Universal Time
d8 / node undefined : 5/29/2023, GMT
Chromium Etc/Unknown : 5/29/2023, GMT

Etc/Unknown is only returned by Chromium, but when testing with either the v8-shell (d8) or with node-js, the resolved time zone is set to the undefined value. (Internally Etc/Unknown is used d8/node, which can be seen by the formatted time zone identifier, which is "GMT".)

@justingrant
Copy link
Collaborator Author

Etc/Unknown is only returned by Chromium, but when testing with either the v8-shell (d8) or with node-js, the resolved time zone is set to the undefined value. (Internally Etc/Unknown is used d8/node, which can be seen by the formatted time zone identifier, which is "GMT".)

@anba Thanks for the info, this is good to know. I'll update the OP to clarify that only Chromium has this behavior.

What do you think about the proposed behavior in this PR? Do you think it'd be an improvement over the current behaviors of engines which either return "Etc/Unknown" (which cannot then be used as input) or "UTC" (which masks the problem from programmers)?

Copy link
Collaborator

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support codifying this use of Etc/Unknown to match CLDR, and this looks like mostly reasonably text.

spec/timezone.html Outdated Show resolved Hide resolved
spec/timezone.html Outdated Show resolved Hide resolved
spec/timezone.html Outdated Show resolved Hide resolved
spec/intl.html Outdated Show resolved Hide resolved
spec/intl.html Show resolved Hide resolved
spec/timezone.html Outdated Show resolved Hide resolved
spec/timezone.html Outdated Show resolved Hide resolved
spec/timezone.html Outdated Show resolved Hide resolved
@anba
Copy link
Contributor

anba commented May 30, 2023

What do you think about the proposed behavior in this PR? Do you think it'd be an improvement over the current behaviors of engines which either return "Etc/Unknown" (which cannot then be used as input) or "UTC" (which masks the problem from programmers)?

"Etc/Unknown" isn't necessarily supported by non-CLDR/ICU programs, though. For example zammad/zammad#4320 where "Etc/Unknown" (and "Europe/Kyiv") when returned from browsers caused issues. (I've only found that issue, because it's the fourth top entry when googling for "Etc/Unknown" time zone for me.)

Similar issues may arise for Java:

shell> java.time.ZoneId.of("Etc/Unknown")
|  Exception java.time.zone.ZoneRulesException: Unknown time-zone ID: Etc/Unknown
|        at ZoneRulesProvider.getProvider (ZoneRulesProvider.java:281)
|        at ZoneRulesProvider.getRules (ZoneRulesProvider.java:236)
|        at ZoneRegion.ofId (ZoneRegion.java:121)
|        at ZoneId.of (ZoneId.java:411)
|        at ZoneId.of (ZoneId.java:359)
|        at (#1:1)

C++20's <chrono> aborts with tzdb: cannot locate zone: Etc/Unknown. (Tested with gcc-13.)

#include <chrono>

int main() {
  std::chrono::get_tzdb().locate_zone("Etc/Unknown");
}

Python's zoneinfo also only supports IANA time zone names:

>>> import zoneinfo
>>> zoneinfo.ZoneInfo("Etc/Unknown")
...
zoneinfo._common.ZoneInfoNotFoundError: 'No time zone found with key Etc/Unknown'

Also found this when searching for "Etc/Unknown":

Support for "Etc/Unknown" was originally added to ICU because they couldn't change their API to return nullptr for invalid time zone identifiers: https://sourceforge.net/p/icu/mailman/message/27085500/

@ljharb
Copy link
Member

ljharb commented May 30, 2023

That kind of sounds like we should accept it as input but not produce it as output.

@justingrant
Copy link
Collaborator Author

That kind of sounds like we should accept it as input but not produce it as output.

This is an interesting idea. My main concern would be unexpected impact of throwing on output. Would too many programs unexpectedly crash even if they weren't sending the IDs to some other system, but were just using the string for serialization between ECMAScript code? Would it degrade debugging tools, logging, Sentry, etc.? The latter is a particular concern because one of the main goals of Etc/Unknown is for it to show up in logs and debugging tools so that developers can figure out what went wrong.

@ljharb
Copy link
Member

ljharb commented May 30, 2023

If a number of other tools will throw, shouldn't we prefer the error to happen in JS, closer to the problem, than after traveling over the wire?

@justingrant
Copy link
Collaborator Author

justingrant commented May 30, 2023

"Etc/Unknown" isn't necessarily supported by non-CLDR/ICU programs, though. For example zammad/zammad#4320 where "Etc/Unknown" (and "Europe/Kyiv") when returned from browsers caused issues. (I've only found that issue, because it's the fourth top entry when googling for "Etc/Unknown" time zone for me.)

Yep, this seems like a problem with no good solutions, only bad and worse solutions. Which of below do you think would be least bad for users?

  1. Return UTC from Intl.DTF / Temporal.Now (current non-Chromium behavior). Recommend that a warning message be logged to the console.
  2. Return Etc/Unknown from Intl.DTF / Temporal.Now, but don't permit this ID as input to TimeZone / ZDT / Intl.DTF (current Chromium behavior)
  3. Return Etc/Unknown from Intl.DTF / Temporal.Now and allow it as input to TimeZone / ZDT / Intl.DTF (proposed in this PR)
  4. Have SystemTimeZoneIdentifier (aka DefaultTimeZone today) throw, presumably with a useful and google-able error message.

@justingrant
Copy link
Collaborator Author

If a number of other tools will throw, shouldn't we prefer the error to happen in JS, closer to the problem, than after traveling over the wire?

Yeah, it's a tough balance between keeping programs running (even if the results may be unexpected) vs. crashing and alerting developers and/or end users that something is wrong.

My main concern would be crashing in a way that's difficult to reason about, because callers of toString may be pretty distant from the actual code that fetches the bad ID. And because it might impede loggers and debug tools which are a developer's best chance of discovering this problem.

Are there any other built-ins with similar throw-on-toString behavior? The closest I can think of is `${Symbol('foo')}` that throws, but even there Symbol('foo').toString() will succeed.

If we were to pick a place to crash that's in between "crash always" (in SystemTimeZoneIdentifier) and "crash never", then I guess another possible place to crash would be when actually using the time zone info for anything but its ID. So you could get Etc/Unknown and test the ID, but using it in any way e.g. zdt.hour or zdt.until would crash. This would prevent a known-bad time zone from producing bad calculations, but at the cost of breaking many ECMAScript programs where use of SystemTimeZoneIdentifier is not critical, e.g. used for logging only. I'll add this as (6) to the list in my comment above.

@ljharb
Copy link
Member

ljharb commented May 30, 2023

To clarify, I didn't necessarily mean that toString or toJSON should throw - that was your inference. I just said we shouldn't output Etc/Unknown. We could also output null or the empty string for the timezone.

@justingrant
Copy link
Collaborator Author

To clarify, I didn't necessarily mean that toString or toJSON should throw - that was your inference. I just said we shouldn't output Etc/Unknown. We could also output null or the empty string for the timezone.

Oh, sorry for misunderstanding. Could I ask a few questions to clarify what you have in mind when the system time zone is unrecognized?

  • What should be returned or thrown if I call Temporal.Now.timeZoneId()?
  • What should be returned or thrown if I call Temporal.Now.zonedDateTimeISO()?
  • What should be returned or thrown if I call Temporal.Now.zonedDateTimeISO().timeZoneId?
  • What should be returned or thrown if I call Temporal.Now.zonedDateTimeISO().getTimeZone()?
  • What should happen if I call Temporal.TimeZone.from('Etc/Unknown') ?
  • What should be returned or thrown if I call Temporal.TimeZone.from('Etc/Unknown').toString() ?

@ljharb
Copy link
Member

ljharb commented May 30, 2023

Certainly if the system timezone is unknown, I wouldn't be surprised by any of the Now methods either throwing, or, logging a warning to the console (which we could only suggest, not mandate) and returning something in UTC.

What should happen if I call Temporal.TimeZone.from('Etc/Unknown')

I think it's fine if that throws, since you're explicitly requesting a TimeZone object for which no time zone is known to exist.

@justingrant justingrant force-pushed the etc-unknown branch 4 times, most recently from 7fcd3ce to 75ad963 Compare May 31, 2023 05:14
@justingrant
Copy link
Collaborator Author

I updated the text in this PR to resolve @gibson042's and @anba's review feedback. Thanks for the reviews!

Also, to guide tomorrow's TG2 discussion, I added four choices of how this problem could be resolved: one that's the "OK for input and output" proposed in this PR, and three others.

@anba
Copy link
Contributor

anba commented May 31, 2023

Throwing for unrecognised system time zone identifiers isn't something we should do, because it will break browsing experience for end users. And at least in the case of Firefox, it'll break the whole browser, because browser internals are also using Intl.DateTimeFormat. (If SystemTimeZoneIdentifier throws for unrecognised system time zone identifiers, then new Intl.DateTimeFormat() no longer works, too.)

@justingrant justingrant force-pushed the etc-unknown branch 2 times, most recently from b5e3aa2 to e69f3cf Compare June 1, 2023 05:19
@sffc
Copy link
Contributor

sffc commented Jun 2, 2023

@tukusejssirs
Copy link

How about using UTC (or Etc/Unknown) with a special TZ object property with the unknown TZ name (a string) we acquired from the OS? This property would be defined both in TZ object and Temporal date objects. Also a warning should be output when such TZ object or date object is used.

Does that makes sense to you? 🤔

@sffc
Copy link
Contributor

sffc commented Jun 30, 2023

Further TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-06-29.md#use-etcunknown-like-icu-does-to-handle-the-case-when-the-oss-time-zone-is-unrecognized-by-ecmascript-25

We failed to reach consensus on this issue so we'll stick with the status quo (unspecified behavior) for the time being and revisit this in the future as a standalone PR.

@justingrant
Copy link
Collaborator Author

We failed to reach consensus on this issue so we'll stick with the status quo (unspecified behavior) for the time being and revisit this in the future as a standalone PR.

Closing. Looking forward to solving this in concert with TG2 in the future.

@justingrant justingrant closed this Jul 5, 2023
justingrant added a commit to justingrant/ecma402 that referenced this pull request Mar 31, 2024
This PR adds explicit support for an Etc/Unknown time zone ID, which
ICU returns when the system time zone is not recognized, usually because
the host environment's OS uses an ID from a newer version of the IANA
Time Zone Database that the ECMAScript engine has not yet been
upgraded to.

This time zone should behave identically to UTC, except it should have
a different localized name to help programmers and end users realize
that there's a problem that requires someone to resolve.

See tc39/proposal-canonical-tz#25 for a previous
attempt to address this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants