Skip to content

Editorial: some document changes #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 22 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,31 @@ ECMAScript proposal at stage 3 of the process, see https://github.com/tc39/propo

This is a specification draft for the legacy (deprecated) RegExp features in JavaScript, i.e., static properties of the constructor like `RegExp.$1` as well as the `RegExp.prototype.compile` method.

This does not reflect what the implementations do, but what the editor thinks to be the least bad thing they ought to do in order to maintain web compatibility.
This does not reflect what the implementations do, but what the editor thinks to be the least bad thing they ought to do to maintain web compatibility.

RegExp static properties (currently not part of ECMA 262,see [tc39/ecma262#137](https://github.com/tc39/ecma262/issues/137)) are specified such that:
RegExp static properties (currently not part of ECMA 262, see [tc39/ecma262#137](https://github.com/tc39/ecma262/issues/137)) are specified such that:

* The values returned by those properties are updated each time a successful match is done.
* They may be deleted. (This is important for secured environments that want to avoid global side-effects.)
* They may be deleted. (This is important for secured environments that want to avoid global side effects.)

The proposal includes another feature that needs consensus and implementation experience before being specced:

* RegExp legacy static properties as well as RegExp.prototype.compile are disabled for instances of proper subclasses of RegExp as well as for cross-realm regexps. [See the detailed motivation here.](subclass-restriction-motivation.md)

We have attempted to [identify potential risks](web-breaking-hazards.md) induced by the the backward-compatibility break introduced by that feature.

See also [the differences between this spec and the current implementations](changes.md).
We have attempted to [identify potential risks](web-breaking-hazards.md) induced by the backward compatibility break introduced by that feature.

See also [the differences between this spec and the current implementations](changes.md).

----

The amendments are relative to the last ECMAScript specification draft found at: https://tc39.github.io/ecma262/
Changes relative to existing algorithms are marked in **bold**.
Changes relative to existing algorithms are marked in **bold**.

All the amendments are part of Annex B, including those that modify objects or algorithm defined in other parts of the spec.
All the amendments are part of Annex B, including those that modify objects or algorithms defined in other parts of the spec.

## [%RegExp%](https://tc39.github.io/ecma262/#sec-regexp-constructor)

The %RegExp% intrinsic object, which is the builtin RegExp constructor, has the following additional internal slots:
The %RegExp% intrinsic object, which is the built-in RegExp constructor, has the following additional internal slots:

* [[RegExpInput]]
* [[RegExpLastMatch]]
Expand All @@ -52,10 +51,9 @@ The %RegExp% intrinsic object, which is the builtin RegExp constructor, has the

The initial value of all these internal slots is the empty String.


## [RegExpAlloc ( _newTarget_ )](https://tc39.github.io/ecma262/#sec-regexpalloc)

RegExp instances have an additional slot which optionally keeps a reference to its constructor. It is used for deciding whether a nonstandard legacy feature is enabled for that regexp. The RegExpAlloc abstract operation is modified as follows:
RegExp instances have an additional slot that optionally keeps a reference to its constructor. It is used for deciding whether a nonstandard legacy feature is enabled for that regexp. The RegExpAlloc abstract operation is modified as follows:

1. Let _obj_ be ? OrdinaryCreateFromConstructor(_newTarget_, "%RegExpPrototype%", «[[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]], **[[Realm]]**, **[[LegacyFeaturesEnabled]]**»).
1. **Let _thisRealm_ be the current Realm Record.**
Expand All @@ -67,18 +65,17 @@ RegExp instances have an additional slot which optionally keeps a reference to i
1. Perform ! DefinePropertyOrThrow(_obj_, "lastIndex", PropertyDescriptor {[[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false}).
1. Return _obj_.


## [RegExpBuiltInExec ( _R_, _S_ )](https://tc39.github.io/ecma262/#sec-regexpbuiltinexec)

In the RegExpBuiltInExec abstract operation, a hook is added for updating the static properties of %RegExp% after a successful match. The last three steps of the algorithm are modified as follows:

1. ...
1. (current step 23) Perform ! CreateDataProperty(_A_, "0", _matchedSubstr_).
1. **Let _capturedValues_ be an new empty List.**
1. **Let _capturedValues_ be a new empty List.**
1. (current step 24) For each integer _i_ such that _i_ > 0 and _i_ ≤ _n_
1. ...
1. (current step 24.e) Perform ! CreateDataProperty(_A_, ToString(_i_) , _capturedValue_).
1. **Append _capturedValue_ to the end of _capturedValues_.**
1. **Append _capturedValue_ to the end of _capturedValues_.**
1. **Let _thisRealm_ be the current Realm Record.**
1. **Let _rRealm_ be the value of _R_’s [[Realm]] internal slot.**
1. **If SameValue(_thisRealm_, _rRealm_) is true, then**
Expand All @@ -88,7 +85,6 @@ In the RegExpBuiltInExec abstract operation, a hook is added for updating the st
1. **Perform InvalidateLegacyRegExpStaticProperties(%RegExp%).**
1. (current step 25) Return _A_.


## UpdateLegacyRegExpStaticProperties ( _C_, _S_, _startIndex_, _endIndex_, _capturedValues_ )

The abstract operation UpdateLegacyRegExpStaticProperties updates the values of the static properties of %RegExp% after a successful match.
Expand All @@ -109,7 +105,7 @@ The abstract operation UpdateLegacyRegExpStaticProperties updates the values of
1. If _i_ ≤ _n_, set the value of _C_’s [[RegExpParen<i>i</i>]] internal slot to the <i>i</i>th element of _capturedValues_.
1. Else, set the value of _C_’s [[RegExpParen<i>i</i>]] internal slot to the empty String.

## InvalidateLegacyRegExpStaticProperties ( _C_)
## InvalidateLegacyRegExpStaticProperties ( _C_ )

The abstract operation InvalidateLegacyRegExpStaticProperties marks the values of the static properties of %RegExp% as non-available.

Expand All @@ -132,13 +128,14 @@ The abstract operation InvalidateLegacyRegExpStaticProperties marks the values o

## Additional properties of the RegExp constructor

All the below properties are accessor properties who have the attributes { [[Enumerable]]: false, [[Configurable]]: true }. Moreover, for the properties whose setter is not explicitely defined, the [[Set]] attribute is set to undefined.
All the below properties are accessor properties that have the attributes { [[Enumerable]]: false, [[Configurable]]: true }. Moreover, for the properties whose setter is not explicitly defined, the [[Set]] attribute is set to undefined.

The accessors check for their this value, so that the properties do not appear to be inherited by subclasses.
The accessors check for their this value so that the properties do not appear to be inherited by subclasses.

### Abstract operations

#### GetLegacyRegExpStaticProperty( _C_, _thisValue_, _internalSlotName_ ).
#### GetLegacyRegExpStaticProperty( _C_, _thisValue_, _internalSlotName_ )

The abstract operation GetLegacyRegExpStaticProperty is used when retrieving a value from a legacy RegExp static property.

1. Assert _C_ is an object that has an internal slot named _internalSlotName_.
Expand All @@ -147,7 +144,8 @@ The abstract operation GetLegacyRegExpStaticProperty is used when retrieving a v
4. If _val_ is **empty**, throw a TypeError exception.
3. Return _val_.

#### SetLegacyRegExpStaticProperty( _C_, _thisValue_, _internalSlotName_, _val_ ).
#### SetLegacyRegExpStaticProperty( _C_, _thisValue_, _internalSlotName_, _val_ )

The abstract operation SetLegacyRegExpStaticProperty is used when assigning a value to a legacy RegExp static property.

1. Assert _C_ is an object that has an internal slot named _internalSlotName_.
Expand All @@ -156,24 +154,25 @@ The abstract operation SetLegacyRegExpStaticProperty is used when assigning a va
4. Set the value of the internal slot of _C_ named _internalSlotName_ to _strVal_.

### RegExp.input

#### get RegExp.input

1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]]).

#### set RegExp.input = _val_
#### set RegExp.input = _val_

1. Perform ? SetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]], _val_).

### RegExp.$_

#### get RegExp.$_

1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]]).

#### set RegExp.$_ = _val_
#### set RegExp.$_ = _val_

1. Perform ? SetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]], _val_).


### get RegExp.lastMatch

1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLastMatch]]).
Expand Down Expand Up @@ -242,7 +241,6 @@ The abstract operation SetLegacyRegExpStaticProperty is used when assigning a va

1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen9]]).


## [RegExp.prototype.compile ( _pattern_, _flags_ )](https://tc39.github.io/ecma262/#sec-regexp.prototype.compile)

The modification below will disable RegExp.prototype.compile for objects that are not direct instances of RegExp as well as in case of mismatch between realms.
Expand All @@ -262,4 +260,3 @@ The modification below will disable RegExp.prototype.compile for objects that ar
1. Let _P_ be _pattern_.
1. Let _F_ be _flags_.
1. Return ? RegExpInitialize(_O_, _P_, _F_).

20 changes: 9 additions & 11 deletions changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ As a consequence, `RegExp.$1 = val` fails silently in sloppy mode but loudly in

Because loud failure is easier to debug.

Implemented by Firefox. Functionaly equivalent semantics (nonwritable properties) implemented by Safari.
Implemented by Firefox. Functionally equivalent semantics (nonwritable properties) implemented by Safari.

## The legacy static properties of RegExp are configurable and deletable

Expand Down Expand Up @@ -47,24 +47,23 @@ where “cross-realm calls” mean things such as:
* `RegExp.prototype.compile.call(otherRealm_regexp, ...)`
* `RegExp.prototype.exec.call(otherRealm_regexp, ...)`

So that realms do not pollute each others. Or, so that if those features are removed in one realm using `delete RegExp.$1`, etc., they are *really* removed for that realm.
So that realms do not pollute each other. Or, so that if those features are removed in one realm using `delete RegExp.$1`, etc., they are *really* removed for that realm.

Note that this is a corner case; in particular, it does *not* concern `otherRealm_regexp.compile()`, because the `compile` method is from the same realm as `otherRealm_regexp`.

Currently, Firefox and Chrome have divergent semantics in that situation.

## All nonstandard legacy features of RegExp are disabled for proper subsclasses of RegExp

In short, if the subclass does non-trivial transformations, the legacy features, as currently implemented, have good chances not to work as expected.
## All nonstandard legacy features of RegExp are disabled for proper subclasses of RegExp

In short, if the subclass does non-trivial transformations, the legacy features, as currently implemented, have a good chance to not work as expected.

# Features not described here and implemented by some browsers only

Those features are considered as not needed for web compatibility and therefore are not part of the proposal.
Those features are considered as unneeded for web compatibility and therefore are not part of the proposal.

## RegExp.index, RegExp.lastIndex

Respectively the start and the end position in the string of the last succesful match.
Respectively the start and the end position in the string of the last successful match.

Implemented by Edge.

Expand All @@ -78,9 +77,8 @@ Implemented by Firefox, but intended to be removed in v48. The property is prese

* Old versions of Firefox (until v44) had a mechanism to restore previous values of RegExp static properties in some situations.
* For some methods (e.g., String#split), the RegExp static properties are typically not updated.
* For some methods (e.g., String#replace used with a callback), the moment when the RegExp static properties are updated is observably different accross implementations.

See [bugzilla@mozilla bug:1208835#c1](https://bugzilla.mozilla.org/show_bug.cgi?id=1208835#c1) for a testcase illustrating the behaviour of different implementations w.r.t. RegExp#replace. (For this particular testcase, our proposal will lead to the same result as Chrome and Safari.)
* For some methods (e.g., String#replace used with a callback), the moment when the RegExp static properties are updated is observably different across implementations.

We have preferred to keep the spec simple rather than trying to be smart or to mimic some implementation: that is to say, we have just specified what is observable for RegExp#exec; and for other methods working with regexps we have relied on the fact that they are specified in terms of RegExp#exec since ES 2015.
See [bugzilla@mozilla bug:1208835#c1](https://bugzilla.mozilla.org/show_bug.cgi?id=1208835#c1) for a test case illustrating the behavior of different implementations w.r.t. RegExp#replace. (For this particular test case, our proposal will lead to the same result as Chrome and Safari.)

We have preferred to keep the spec simple rather than trying to be smart or to mimic some implementation: that is to say, we have just specified what is observable for RegExp#exec, and for other methods working with regexps we have relied on the fact that they are specified in terms of RegExp#exec since ES 2015.
21 changes: 9 additions & 12 deletions subclass-restriction-motivation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Why disable legacy RegExp features for proper subclasses of RegExp?

Basically, because it breaks encapsulation.
Because it breaks encapsulation.

In ES 2015 + web reality, there are two ways to set the semantics of a regexp:

Expand All @@ -9,16 +9,16 @@ In ES 2015 + web reality, there are two ways to set the semantics of a regexp:

Similarly, there are two ways to get information about a successful match:

1. reading the returned value of the `RegExp.prototype.exec()` method. (Recall that all other methods that needs to execute a regexp are written in terms of `RegExp.prototype.exec()`);
1. reading the returned value of the `RegExp.prototype.exec()` method. (Recall that all other methods that need to execute a regexp are written in terms of `RegExp.prototype.exec()`);
2. reading the deprecated `RegExp.$1`, etc. static properties.

A subclass of `RegExp` may want to redefine the constructor and the `exec()` method, without caring about the legacy features, leaving them as potentially broken.

Below are concrete illustrations of what could be wrong.

### Legacy static properties (RegExp.$1, etc.)

Suppose you write a subclass of RegExp that allows to apply a regular expression simultaneously (to be more correct: successively) to each elements of an array of strings. Then, the RegExp static properties will likely give information about the match against the last string only, which is probably not what is intended.
Suppose you write a subclass of RegExp that allows you to apply a regular expression simultaneously (to be more correct: successively) to each element of an array of strings. Then, the RegExp static properties will likely give information about the match against the last string only, which is probably not what is intended.

Another issue with those static properties is their lack of encapsulation. For example, suppose that we evaluate the three following expressions in order:

Expand All @@ -28,23 +28,20 @@ Object.keys(bar)
RegExp.$1
```

Suppose now that you are running in an environment incorporating a polyfill that emulates symbols using strings of shape `/^symbol-[0-9]{20}-/`; that polyfill could monkey-patch `Object.keys` in order to filter out strings that are of the form of emulated symbols. Then, `RegExp.$1` will likely leak implementation details from that polyfill instead of returning the desired result.
Suppose now that you are running in an environment incorporating a polyfill that emulates symbols using strings of shape `/^symbol-[0-9]{20}-/`; that polyfill could monkey-patch `Object.keys` to filter out strings that are of the form of emulated symbols. Then, `RegExp.$1` will likely leak implementation details from that polyfill instead of returning the desired result.

Although that problem is not specific to proper subclasses of RegExp, it may become worse, because the offending expression could be hidden in the implementation of the subclass, e.g., in an `exec()` or `@@replace()` method of the subclass.


### RegExp.prototype.compile()

Suppose you have a subclass of RegExp that supports the `x` flag. Likely, the constructor will rewrite the provided pattern by removing spaces and comments, and forward the modified pattern to the base constructor. However, as currently specified, the `.compile()` method will not give the opportunity to rewrite its provided pattern, as it will not call the constructor. Since it is a nonstandard and deprecated feature, attempting to fix it properly is not worth the trouble.


## Why disable those features for cross-realm regexps

That is, if you apply `RegExp.prototype.exec()`, respectively `RegExp.prototype.compile()`, to a regexp constructed in another realm, then the static properties of RegExp won’t be updated, respectively a TypeError will be thrown.

First, this is really an edge case. Code like `otherRealm_regexp.exec()` is *not* affected, because `otherRealm_regexp` is from the same realm as `otherRealm_regexp.exec`. The issue arises, e.g., in `RegExp.prototype.exec.call(otherRealm_regexp)`.
That is, if you apply `RegExp.prototype.exec()`, respectively `RegExp.prototype.compile()`, to a regexp constructed in another realm, then the static properties of RegExp won’t be updated, a TypeError will be thrown.

Now, concerning the static properties of the RegExp constructor: The constructor of which realm should be affected? The realm of the regexp (as thinks Firefox) or the current realm—i.e., the realm of the `.exec()` method—(as think other browsers)? Well, we don’t need to decide: the test that disables the feature for proper subclasses of RegExp will naturally disable it for RegExp objects from other realms. This has the further advantage to prevent different realms from polluting each other.
First, this is an edge case. Code like `otherRealm_regexp.exec()` is *not* affected, because `otherRealm_regexp` is from the same realm as `otherRealm_regexp.exec`. The issue arises, e.g., in `RegExp.prototype.exec.call(otherRealm_regexp)`.

Now, concerning the static properties of the RegExp constructor: The constructor of which realm should be affected? The realm of the regexp (as thinks Firefox) or the current realm—i.e., the realm of the `.exec()` method—(as think other browsers)? Well, we don’t need to decide: the test that disables the feature for proper subclasses of RegExp will naturally disable it for RegExp objects from other realms. This has the further advantage of preventing different realms from polluting each other.

About the `.compile()` method: The restriction enables one to *really* protect all regexps of a given realm from tampering by doing `delete RegExp.prototype.compile`, because you couldn’t recover a working method from another realm.
About the `.compile()` method: The restriction enables one to *really* protect all regexps of a given realm from tampering by doing `delete RegExp.prototype.compile` because you couldn’t recover a working method from another realm.
Loading