From c44edd4e96b556929d5c64c15e8f7acb5749ed48 Mon Sep 17 00:00:00 2001 From: Justin Grant Date: Wed, 22 May 2024 17:47:57 -0700 Subject: [PATCH] Normative: Revert U+2212 in timezone offsets Following ISO-8601, #2781 introduced U+2212 (Unicode minus) as an alias for the regular ASCII minus sign for use in time zone offsets. There's two new data that lead me to believe that this was a mistake, and that we should revert this change. The first is that the newly-released RFC 9557 (the string format standard that Temporal uses) disallows non-ASCII characters. Its predecessor RFC 3339 also disallows non-ASCII characters. So strings that follow the current (since 2022) ECMAScript spec could be rejected by RFC 9557 clients. The second new data is feedback from implementers of a Rust version of Temporal that this single obscure character in the grammar will incur a performance cost because they must now use Rust strings instead of plain U8 ASCII data. See https://github.com/tc39/proposal-temporal/issues/2843#issuecomment-2119724671 This performance issue doesn't seem to be limited to Rust. Any native implementation would likely benefit from being able to know that valid date/time input (both Date and Temporal) is always ASCII-only. I don't know whether all engines have actually implemented this 2022 grammar change. But it's also a safe bet that real-world usage of this Unicode character is likely minimal. So the web-compat risk seems small. If this PR is accepted, then we'll follow up with a normative Temporal PR to remove this character from Temporal as well. --- spec.html | 45 ++++++--------------------------------------- 1 file changed, 6 insertions(+), 39 deletions(-) diff --git a/spec.html b/spec.html index 6afb05c757..fa709daa3f 100644 --- a/spec.html +++ b/spec.html @@ -32731,46 +32731,14 @@

Time Zone Offset String Format

ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601. The format is described by the following grammar. - The usage of Unicode code points in this grammar is listed in .

- - - - - - - - - - - - -
- Code Point - - Unicode Name - - Abbreviation -
- `U+2212` - - MINUS SIGN - - <MINUS> -
-
-

Syntax

UTCOffset ::: - TemporalSign Hour - TemporalSign Hour HourSubcomponents[+Extended] - TemporalSign Hour HourSubcomponents[~Extended] - - TemporalSign ::: - ASCIISign - <MINUS> + ASCIISign Hour + ASCIISign Hour HourSubcomponents[+Extended] + ASCIISign Hour HourSubcomponents[~Extended] ASCIISign ::: one of `+` `-` @@ -32844,9 +32812,9 @@

1. Let _parseResult_ be ParseText(StringToCodePoints(_offsetString_), |UTCOffset|). 1. Assert: _parseResult_ is not a List of errors. - 1. Assert: _parseResult_ contains a |TemporalSign| Parse Node. - 1. Let _parsedSign_ be the source text matched by the |TemporalSign| Parse Node contained within _parseResult_. - 1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS) or U+2212 (MINUS SIGN), then + 1. Assert: _parseResult_ contains a |ASCIISign| Parse Node. + 1. Let _parsedSign_ be the source text matched by the |ASCIISign| Parse Node contained within _parseResult_. + 1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS), then 1. Let _sign_ be -1. 1. Else, 1. Let _sign_ be 1. @@ -48565,7 +48533,6 @@

Number Conversions

Time Zone Offset String Format

-