From c44edd4e96b556929d5c64c15e8f7acb5749ed48 Mon Sep 17 00:00:00 2001
From: Justin Grant <justingrant@users.noreply.github.com>
Date: Wed, 22 May 2024 17:47:57 -0700
Subject: [PATCH] Normative: Revert U+2212 in timezone offsets

Following ISO-8601, #2781 introduced U+2212 (Unicode minus) as an alias
for the regular ASCII minus sign for use in time zone offsets.

There's two new data that lead me to believe that this was a mistake,
and that we should revert this change.

The first is that the newly-released RFC 9557 (the string format
standard that Temporal uses) disallows non-ASCII characters. Its
predecessor RFC 3339 also disallows non-ASCII characters. So
strings that follow the current (since 2022) ECMAScript spec
could be rejected by RFC 9557 clients.

The second new data is feedback from implementers of a Rust version of
Temporal that this single obscure character in the grammar will incur a
performance cost because they must now use Rust strings instead
of plain U8 ASCII data. See
https://github.com/tc39/proposal-temporal/issues/2843#issuecomment-2119724671

This performance issue doesn't seem to be limited to Rust. Any
native implementation would likely benefit from being able to know that
valid date/time input (both Date and Temporal) is always ASCII-only.

I don't know whether all engines have actually implemented this 2022
grammar change. But it's also a safe bet that real-world usage of this
Unicode character is likely minimal. So the web-compat risk seems small.

If this PR is accepted, then we'll follow up with a normative Temporal
PR to remove this character from Temporal as well.
---
 spec.html | 45 ++++++---------------------------------------
 1 file changed, 6 insertions(+), 39 deletions(-)
diff --git a/spec.html b/spec.html
index 6afb05c757..fa709daa3f 100644
--- a/spec.html
+++ b/spec.html
@@ -32731,46 +32731,14 @@ <h1>Time Zone Offset String Format</h1>
         <p>
           ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601.
           The format is described by the following grammar.
-          The usage of Unicode code points in this grammar is listed in <emu-xref href="#table-time-zone-offset-string-code-points"></emu-xref>.
         </p>
 
-        <emu-table id="table-time-zone-offset-string-code-points" caption="Time Zone Offset String Code Points">
-          <table>
-            <tr>
-              <th>
-                Code Point
-              </th>
-              <th>
-                Unicode Name
-              </th>
-              <th>
-                Abbreviation
-              </th>
-            </tr>
-            <tr>
-              <td>
-                `U+2212`
-              </td>
-              <td>
-                MINUS SIGN
-              </td>
-              <td>
-                &lt;MINUS>
-              </td>
-            </tr>
-          </table>
-        </emu-table>
-
         <h2>Syntax</h2>
         <emu-grammar type="definition">
           UTCOffset :::
-            TemporalSign Hour
-            TemporalSign Hour HourSubcomponents[+Extended]
-            TemporalSign Hour HourSubcomponents[~Extended]
-
-          TemporalSign :::
-            ASCIISign
-            &lt;MINUS&gt;
+            ASCIISign Hour
+            ASCIISign Hour HourSubcomponents[+Extended]
+            ASCIISign Hour HourSubcomponents[~Extended]
 
           ASCIISign ::: one of
             `+` `-`
@@ -32844,9 +32812,9 @@ <h1>
           <emu-alg>
             1. Let _parseResult_ be ParseText(StringToCodePoints(_offsetString_), |UTCOffset|).
             1. Assert: _parseResult_ is not a List of errors.
-            1. Assert: _parseResult_ contains a |TemporalSign| Parse Node.
-            1. Let _parsedSign_ be the source text matched by the |TemporalSign| Parse Node contained within _parseResult_.
-            1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS) or U+2212 (MINUS SIGN), then
+            1. Assert: _parseResult_ contains a |ASCIISign| Parse Node.
+            1. Let _parsedSign_ be the source text matched by the |ASCIISign| Parse Node contained within _parseResult_.
+            1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS), then
               1. Let _sign_ be -1.
             1. Else,
               1. Let _sign_ be 1.
@@ -48565,7 +48533,6 @@ <h1>Number Conversions</h1>
   <emu-annex id="sec-time-zone-offset-string-format">
     <h1>Time Zone Offset String Format</h1>
     <emu-prodref name="UTCOffset"></emu-prodref>
-    <emu-prodref name="TemporalSign"></emu-prodref>
     <emu-prodref name="ASCIISign"></emu-prodref>
     <emu-prodref name="Hour"></emu-prodref>
     <emu-prodref name="HourSubcomponents"></emu-prodref>