Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not using Etc/Unknown time zone #5653

Open
robertbastian opened this issue Oct 4, 2024 · 15 comments
Open

Consider not using Etc/Unknown time zone #5653

robertbastian opened this issue Oct 4, 2024 · 15 comments
Labels
2.0-breaking Changes that are breaking API changes C-time-zone Component: Time Zones discuss-priority Discuss at the next ICU4X meeting S-small Size: One afternoon (small bug fix or enhancement)

Comments

@robertbastian
Copy link
Member

The Etc/Unknown timezone is not a TZDB timezone. It was invented by CLDR to signal the absence of time zone information, and this is causing confusion when surfaced to users: https://www.google.com/search?q=etc%2Funknown, tc39/proposal-temporal#2510.

Using Etc/Unknown means that every API that handles a time zone input has to special-case this input, which is easy to miss. A more solid API would use Option<TimeZoneBcp47Id> with None instead of Etc/Unknown.

Our TimeZoneMapper already returns Option<TimeZoneBcp47Id>, and our formatter input already uses Option<TimeZoneBcp47Id> so we effectively have two ways of representing unknown time zones: None and Some("unk"). During formatting, None will fall back to the offset (good), and then to {GMT+?} (good), whereas Some("unk") will format as Unknown City Daylight Time or heure: ville inconnue under location formats even when there is an offset that could be formatted.1

My concrete proposal is:

  • Remove the Etc/Unknown key (and its alias, Factory) from the IANA->BCP47 mapping.
    • Asking for the BCP47 ID for the Etc/Unknown IANA ID will fail, because Etc/Unknown is not a valid IANA ID
    • Asking for the BCP47 ID for the Factory IANA ID will fail because we say so (and document it)
  • Remove the unk key from the BCP47->IANA mapping
    • It's incorrect to have it in there, because Etc/Unknown is not a canonical IANA ID
  • Make the location formats fail if there's no exemplar city for a time zone, so that the next format can be tried. The location formats currently produce very awkward strings for the Etc/Unknown time zone, even when an offset could be formatted.

Footnotes

  1. I understand UTS-35 wants it like this, but I don't see why.

@robertbastian robertbastian added S-small Size: One afternoon (small bug fix or enhancement) C-time-zone Component: Time Zones discuss-priority Discuss at the next ICU4X meeting 2.0-breaking Changes that are breaking API changes labels Oct 4, 2024
@robertbastian robertbastian added this to the ICU4X 2.0 ⟨P1⟩ milestone Oct 4, 2024
@sffc
Copy link
Member

sffc commented Oct 4, 2024

@justingrant

@sffc
Copy link
Member

sffc commented Oct 4, 2024

We've discussed Etc/Unknown a number of times in the ECMAScript context, but I'm not finding the conclusion of those conversations.

I know that @justingrant got more aliases added to 2024b. I thought Factory was part of that but maybe not.

As far as time zone display goes, there's a difference between "unk" and None, as you noted, which is whether "Unknown City" is displayed instead of the UTC offset time zone. It is likely desirable if there is no offset available: Unknown City is better than GMT+? when starting from a location format. Maybe this could be handled internally by sticking Unknown City as the final fallback for location-based formats (after trying offset).

And, as you also noted, we're doing this because UTS 35 says we should. So this seems like material we maybe should first discuss with them.

@robertbastian
Copy link
Member Author

It is likely desirable if there is no offset available: Unknown City is better than GMT+? when starting from a location format.

This won't happen under the plans in #5533, as an input with no time zone and no offset won't be possible.

I also think it should say something like Unknown timezone, Unknown City Time is weird because normal people don't think of time zones as being associated with cities, but regions.

@sffc
Copy link
Member

sffc commented Oct 4, 2024

Wait: if we move in the direction of being super strict about fields being present or not in the time zone, as you propose in #5533, then it wouldn't be possible to construct a TimeZone::<Location> with an unknown city. That's a problem, because cities get added from time to time and ICU4X should have fallback behavior for them (as prescribed by both UTS-35 and ECMA-402). For example, even if we don't return Etc/Unknown from the parse function, it should be possible to write code such as

// Europe/Kyiv was not added until 2022b; it was previously Europe/Kiev.
// Older installations might not know about the updated name yet.
let location_time_zone = TimeZone::<Location>::try_from_str("Europe/Kyiv")
    .unwrap_or(TimeZone::<Location>::unknown());

@justingrant
Copy link

I agree with @robertbastian that the current state is bad and annoying for both implementers and end-users.

FWIW, the current state in ECMAScript is that when the engine doesn't recognize the OS's time zone, ICU4C returns Etc/Unknown when asked for the system time zone, and at least in Chrome and Safari AFAIK will return Etc/Unknown to the ECMAScript userland caller. But if you try to use Etc/Unknown as input, V8/JSC/SM engines will all throw. Example:

new Intl.DateTimeFormat("en", {timeZone: 'Etc/Unknown'})
// => Uncaught RangeError: invalid time zone in DateTimeFormat(): Etc/Unknown

I believe that this behavior means that new Intl.DateTimeFormat() or Temporal.now.zonedDateTimeISO() can throw on a system whose OS's TZDB is newer than the browser's TZDB. This is not generally an issue for Safari because on Apple platforms JSC will use the OS's TZDB so they'll never get out of sync. But it's a problem for Chrome and Edge for users who install into a newly-added time zone. I'm not sure about FF.

We've discussed Etc/Unknown a number of times in the ECMAScript context, but I'm not finding the conclusion of those conversations.

The conclusion of those previous discussions was that there was no consensus on what to do about it. ☹️ See the 2023-06-01 and 2023-06-29 TG2 meeting notes where we discussed this.

I know that @justingrant got more aliases added to 2024b. I thought Factory was part of that but maybe not.

What you might be thinking of is a change made in CLDR where Factory is now an alias for Etc/Unknown. So if the system's time zone is Factory, then the latest ICU4C reports the canonical ID as Etc/Unknown. AFAIK, systems with Factory as their time zone are very rare because modern OS's now make time zone setting part of OS installation or IT provisioning.

But there was no upstream change in how Factory is handled in IANA. Note that this change wasn't rejected by IANA. I just didn't ask them for any changes.

I could ask the IANA TZ list if they'd be willing to add Etc/Unknown, which currently doesn't exist in IANA, and make Factory a Link to Etc/Unknown. Should I do this? Based on how surprisingly easy it was to get them to update those legacy aliases, I think there's a reasonable chance of success with this change in IANA.

At least then we'd have a universal name across IANA and CLDR for the "I have no idea what this time zone is" case. But it still leaves the question of how engines (or ICU4X) should handle that unknown time zone case. Should they:

  • Throw when asked for the system time zone ID?
  • Not throw when Etc/Unknown is provided as a time zone ID input, but instead treat Etc/Unknown as UTC for calculations and as "Unknown time zone" for localization?
  • Not throw when asked for the system time zone ID, but continue throwing when using that ID as input even though it's valid in IANA? (This is AFAIK the current behavior of engines today.)

I also think it should say something like Unknown timezone, Unknown City Time is weird because normal people don't think of time zones as being associated with cities, but regions.

I strongly agree with this. Only people who know about the IANA Time Zone Database will associate cities with time zones. I think "Unknown time zone" is the right English localized name for Etc/Unknown and Factory.

@anba
Copy link

anba commented Oct 7, 2024

I'm not sure about FF.

If the system time zone identifier is unknown, Firefox first checks if there's a Etc/GMT±XX zone which matches the system time zone offset. If there's no such replacement time zone, Firefox falls back to UTC.

@robertbastian
Copy link
Member Author

This seems similar to the behaviour we are discussing in #5533, i.e. if there's no time zone but an offset, we can work with the offset. I feel like the key questions are

  • Are Some("Etc/Unknown") and None the same time zone?
  • Are Some(0) and None the same offset?

It sounds like FF's answer to those is yes and yes, ICU4X's current answer is no and no.

@anba if FF fails to find a GMT+XX zone for the system offset and falls back to UTC, which offset does it then use, system or UTC?

@anba
Copy link

anba commented Oct 7, 2024

@anba if FF fails to find a GMT+XX zone for the system offset and falls back to UTC, which offset does it then use, system or UTC?

It falls back to UTC offset. (This is for the Intl.DateTimeFormat code path. I wouldn't be surprised if other parts in Firefox use a different fallback mechanism, because this situation when the system time zone isn't recognised doesn't happen too often, so it's not well tested. Bug 1856428 and Bug 1679532 are two actual examples when this fallback path was used.)

@robertbastian
Copy link
Member Author

#5661 #5656

@robertbastian
Copy link
Member Author

This zone seems to exist in ICU only because they weren't able to make the API return null due to stability concerns: https://sourceforge.net/p/icu/mailman/icu-design/thread/OF8057666A.AF51DAC4-ON8525783C.00236E03-8525783C.00238179%40lotus.com/#msg27085500

We are already returning an Option, so we should just do None.

robertbastian added a commit that referenced this issue Oct 15, 2024
@sffc
Copy link
Member

sffc commented Oct 17, 2024

WG discussion:

Full: (SpecificNonLocation)
    LocationOrGarbage
    Option<Offset>
    ZoneVariant
    LocalTime

AtTime: (GenericLocation, GenericNonLocation)
    LocationOrGarbage
    LocalTime
    Option<Offset>

LocationWithOffset: (GenericLocation)
    LocationOrGarbage
    Option<Offset>

OffsetOnly: (Offset)
    Option<Offset>

LocationWithVariant:
    LocationOrGarbage
    ZoneVariant

LocationOnly:
    LocationOrGarbage

OffsetAtTime:
    Option<Offset>
    LocalTime

IXDTF cases:

Parse errors:

  • Z[America/Chicago] -> parse error
  • Z[Foo/Bar] -> parse error

Z:

  • Z -> utc, 0 (anything)

OffsetOnly:

  • -0500 -> -5 (OffsetOnly)
  • +? -> (OffsetOnly)

LocationOnly:

  • [America/Chicago] -> uschi (LocationOnly)
  • [Foo/Bar] -> unk (LocationOnly)

Full:

  • -0500[America/Chicago] -> uschi, -5 (LocationWithOffset)
  • -0500[Foo/Bar] -> unk, -5 (LocationWithOffset)
  • +?[Foo/Bar] -> unk, unk (LocationWithOffset)

Discussion:

  • @sffc I think that OffsetOnly is a unique use case from offset-with-Etc/Unknown. If you have an offset only, you should never be allowed to pass it into a format that needs a location. It might also be the input to some hypothetical future algorithm that infers a location. If the location is specified but unknown (like Europe/Kyiv), it's fine to use a location format (when you update your data, you might start getting a match), and such a time zone should never be passed into a location inference algorithm.
  • @robertbastian - This algorithm cannot ever happen. You cannot guess the time zone from a single timestamp and a locale. If you know the location then you already know the location. Europe/Kyiv can already become Etc/Unknown or unk way before hitting ICU4X.
  • @sffc Given a TZDB, it would be possible to write an algorithm that picks the best time zone for a given offset and local time. It doesn't need to depend on the locale's region, though the region could be used as a hint.
  • @robertbastian - No
  • @robertbastian - An offset only is a poor man's timezone, we should not encourage that as an i18n library. We should force the user to give us offset and location, and if they're parsing it from an ixdtf string that's offset only, we can make them explicitly add unknown. An offset is a property of a time zone but not a time zone itself.

More scratch pad:

TimeZoneInfo<OffsetOnly>

UtcOffset

ZonedDateTime<Iso, TimeZoneInfo<LocationOnly>>

ZonedDateTime<Iso, UtcOffset>

(DateTime<Iso>, UtcOffset)
    
UtcOffset::with_unknown_location() -> TimeZoneInfo<LocationWithOffset>

ZonedDateTime::<UtcOffset>::with_unknown_location() -> ZonedDateTime<TimeZoneInfo<LocationWithOffset>>

pub struct ZonedDateTime<A, Z> {
    date: Date<A>,
    time: Time,
    zone: Z,
}

Conclusions:

  • TimeZoneInfo requires a location or unk
  • UtcOffset is supported in formatting (only for the LocalizedOffset format) and parsing
  • Don't advertise offset-only formatting in docs and tutorials except when specifically discussing offset-only formats
  • ZonedDateTime, if we have such a type, can accept either a TimeZoneInfo or a UtcOffset

LGTM: @sffc @robertbastian @Manishearth

@sffc
Copy link
Member

sffc commented Oct 17, 2024

To add to my previous comment: I can envision some data that maps an offset to a "primary metazone": all 24+ of the most common offsets have a primary metazone. This requires TZDB since UTC+1 could be either "Western European Summer Time" or "Central European Standard Time". This operation should only be performed if there is no named time zone. I am not necessarily advocating for this, but I am using it as evidence that "offset only" and "offset with time zone unk" are two different concepts.

@robertbastian
Copy link
Member Author

all 24+ of the most common offsets have a primary metazone.

I don't think that's correct. UTC+1 could, on the same day (in the northern summer), be Western European Summer Time or West Africa Time; two metazones with different standard offsets that match because one observes DST and the other doesn't.

@sffc
Copy link
Member

sffc commented Oct 18, 2024

I posted my idea in https://unicode-org.atlassian.net/browse/CLDR-18037.

@sffc
Copy link
Member

sffc commented Oct 18, 2024

This is mostly implemented in #5700. We haven't handled the Z case in parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes C-time-zone Component: Time Zones discuss-priority Discuss at the next ICU4X meeting S-small Size: One afternoon (small bug fix or enhancement)
Projects
Status: No status
Development

No branches or pull requests

4 participants