Normative: Revert U+2212 in timezone offsets

Following ISO-8601, #2781 introduced U+2212 (Unicode minus) as an alias for the regular ASCII minus sign for use in time zone offsets. There's two new data that lead me to believe that this was a mistake, and that we should revert this change. The first is that the newly-released RFC 9557 (the string format standard that Temporal uses) disallows non-ASCII characters. Its predecessor RFC 3339 also disallows non-ASCII characters. So strings that follow the current (since 2022) ECMAScript spec could be rejected by RFC 9557 clients. The second new data is feedback from implementers of a Rust version of Temporal that this single obscure character in the grammar will incur a performance cost because they must now use Rust strings instead of plain U8 ASCII data. See tc39/proposal-temporal#2843 (comment) This performance issue doesn't seem to be limited to Rust. Any native implementation would likely benefit from being able to know that valid date/time input (both Date and Temporal) is always ASCII-only. I don't know whether all engines have actually implemented this 2022 grammar change. But it's also a safe bet that real-world usage of this Unicode character is likely minimal. So the web-compat risk seems small. If this PR is accepted, then we'll follow up with a normative Temporal PR to remove this character from Temporal as well.
tc39 · May 23, 2024 · c44edd4 · c44edd4
1 parent 53454a9
commit c44edd4
Showing 1 changed file with 6 additions and 39 deletions.
diff --git a/spec.html b/spec.html
@@ -32731,46 +32731,14 @@ <h1>Time Zone Offset String Format</h1>
         <p>
           ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601.
           The format is described by the following grammar.
-          The usage of Unicode code points in this grammar is listed in <emu-xref href="#table-time-zone-offset-string-code-points"></emu-xref>.
         </p>
 
-        <emu-table id="table-time-zone-offset-string-code-points" caption="Time Zone Offset String Code Points">
-          <table>
-            <tr>
-              <th>
-                Code Point
-              </th>
-              <th>
-                Unicode Name
-              </th>
-              <th>
-                Abbreviation
-              </th>
-            </tr>
-            <tr>
-              <td>
-                `U+2212`
-              </td>
-              <td>
-                MINUS SIGN
-              </td>
-              <td>
-                &lt;MINUS>
-              </td>
-            </tr>
-          </table>
-        </emu-table>
-
         <h2>Syntax</h2>
         <emu-grammar type="definition">
           UTCOffset :::
-            TemporalSign Hour
-            TemporalSign Hour HourSubcomponents[+Extended]
-            TemporalSign Hour HourSubcomponents[~Extended]
-
-          TemporalSign :::
-            ASCIISign
-            &lt;MINUS&gt;
+            ASCIISign Hour
+            ASCIISign Hour HourSubcomponents[+Extended]
+            ASCIISign Hour HourSubcomponents[~Extended]
 
           ASCIISign ::: one of
             `+` `-`
@@ -32844,9 +32812,9 @@ <h1>
           <emu-alg>
             1. Let _parseResult_ be ParseText(StringToCodePoints(_offsetString_), |UTCOffset|).
             1. Assert: _parseResult_ is not a List of errors.
-            1. Assert: _parseResult_ contains a |TemporalSign| Parse Node.
-            1. Let _parsedSign_ be the source text matched by the |TemporalSign| Parse Node contained within _parseResult_.
-            1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS) or U+2212 (MINUS SIGN), then
+            1. Assert: _parseResult_ contains a |ASCIISign| Parse Node.
+            1. Let _parsedSign_ be the source text matched by the |ASCIISign| Parse Node contained within _parseResult_.
+            1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS), then
               1. Let _sign_ be -1.
             1. Else,
               1. Let _sign_ be 1.
@@ -48565,7 +48533,6 @@ <h1>Number Conversions</h1>
   <emu-annex id="sec-time-zone-offset-string-format">
     <h1>Time Zone Offset String Format</h1>
     <emu-prodref name="UTCOffset"></emu-prodref>
-    <emu-prodref name="TemporalSign"></emu-prodref>
     <emu-prodref name="ASCIISign"></emu-prodref>
     <emu-prodref name="Hour"></emu-prodref>
     <emu-prodref name="HourSubcomponents"></emu-prodref>