-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support y
sticky flag for regular expressions
#904
Comments
Hm, not sure how this would be implemented. It'd have to be used in conjunction with a polyfill/stdlib like core-js that would shim in the regex behaviour to the regex instance methods and Babel would then need a way to let it know that the regex is sticky. /cc @zloirock Also yeah, your interpretation of sticky regexes is correct. MDN is a bit vague sometimes and not very user friendly. |
XRegExp has an implementation: |
@developit Their sticky regex detection is bad and will never correctly work. |
Could probably just do something like: var foo = /foo/y; var _regex;
var foo = _regex = /foo/, (_regex.sticky = true), _regex; // might have to be `_sticky` or something, not sure how it'll play across environments and then leave it up to |
If efficiency is sacrificiable, we can do some small (mabe not so small) evil:
|
@neVERberleRfellerER You can pass a regex to |
FYI: the spec says that |
@getify Ah great, haven't had a need to read that part yet. Still need to take into consideration |
@neVERberleRfellerER Neither do I, was just throwing around methods that can accept regexes. |
I'm not positive if my interpretations here are entirely correct, but I wrote up some details about how I think sticky works: https://github.com/getify/You-Dont-Know-JS/blob/master/es6%20&%20beyond/ch2.md#sticky-flag Help in vetting that info would be appreciated, as you explore this possible fix for Babel. |
I have just tested this in io.js (according to MDN V8 supports the 'y' flag when turned on by flag --- mindblown --- without comments) and Firefox (supports 'y', but has bugged '^' with it, does not matter here), both have same results:
Neither Firefox nor io.js work with samples on @getify s page. I do not know why, but I am still testing. Update:
To make the next search working, i have to set last index again - to the index of next expected match, Update 2: According to description on MDN this behavior is correct. |
I don't trust the MDN page at all. I also don't trust V8's implementation. Also, I am not particularly interested in whether What's interesting is whether it works with all the other regex-using methods. The spec says that all of them should funnel through the internal exec algorithm, so they should all work (as I was depicting -- I think). But I don't trust that engines are actually sharing that code. |
@getify |
@getify I believe that implementations are correct with the behavior decribed in my comment with two updates.
|
I thought about it. Consider proposals. Please, open issue in @developit XRegExp uses native in FF, but not adds implementation. XRegExp('.', 'y')
SyntaxError: invalid regular expression flag y @sebmck be better at first select a way. Maybe replace it to constructor call, |
@zloirock Ah yeah, that's a much better way actually. |
I apologize if I'm dense and missing something, but my reading of the ES6 spec draft does not match (pun intended) your interpretations, if I understand them. Let me observe some things about how I'm reading it:
What am I missing here? I sure read that to be that [Edit]: Just remembered I had, on a different topic entirely, recently asked this question of @allenwb, and he'd confirmed that |
Added the sticky regex desugaring suggested by @zloirock so it can potentially be delegated to core-js etc. ie.
turns into var foo = new RegExp("bar", "y"); |
Step 21.2.5.2.2.14 probably? matchSucceeded is set to false and following steps are as I have said.
As stated in 21.1.3.11 String.prototype.match uses @ @ match which is defined in 6.1.5.1 as
So yes, you are right with this. However:
is not correct in FF and V8 (warning - source: own research). Update Sorry for tagging user match. Update 2 Everything was updated. Sorry for possible misunderstanding and confusion. Update 3 For example, this works in both FF and V8
|
Swapping the literal for a constructor looks perfect, wish I'd thought of that. |
Perhaps I've misunderstood even your premise of discussion. Is your position:
|
This. However, the most important part of discussion is the behavior of sticky regexpes (for example when using exec directly). My stance and review of discussed problem (to make sure that we are discussing the same thing): Implementations are corrent, when they return null and set lastIndex to 0 when match does not coccur exactly at lastIndex. This is in conflict with your samples (https://github.com/getify/You-Dont-Know-JS/blob/master/es6%20&%20beyond/ch2.md#sticky-flag). Your stance:
But matchSucceeded is set to false in step 14 and step 15 is evaluated. Then, if machSucceeded is set, it will get to step 18 (through step 16 and eventually 17). When match in step 15 results in failure, step 15.c.i is taken which results in null returned and lastIndex set to zero. Which is consistent with behvior of both tested implementations (FF, V8), |
I'm glad I clarified, because I thought you were saying something entirely different and I was arguing that diff point. :)
Wait, how is it in conflict with my examples? Both my ...
re.exec( str ); // null -- no more matches!
re.lastIndex; // 0 -- starts over now! and ...
re.test( str ); // false -- no more matches!
re.lastIndex; // 0 -- starts over now! Isn't that not indeed in agreement both with you and the spec? [Edit]: All the stuff in your previous message, where you quote "my stance"... none of that was arguing against |
@getify Problem is in this (taken from 'y' samples)
When match does not occur at lastIndex, it should set lastIndex to 0 (it already is in this case) and return null. So, I expect the following output:
Because 15.c.i is taken in this case. Update reaction to [edit]ed part: No, but it was (I think at last) arguing against |
Ahh, further clarification of which example you were concerned about! :)
I don't interpret that at all. Let me explain. From 25.2.5.2.2, we get:
So I guess where we're differing is 15.b... i see it succeeding, since at that point What am I missing? |
Indeed. This is the only difference. It should not succeed and the reason why non-sticky regexps succeed is, that they start again from next position, since 15.c.ii is taken:
Update Some samples (Update 2 warning: untested)
|
I still don't understand why? My understanding is that a regular expression engine does its own internal forward and back-tracking stuff looking for suitable matches, so even though it doesn't match I don't see that the looping (and incrementing of Am I fundamentally misunderstanding how the internal regex matcher engine works? |
According to 21.2.3.2.2 (RegExpInitialize):
According to 21.2.2.2 (Pattern):
According to 21.2.5.2, the exec uses RegExpBuiltinExec only (21.2.5.2 point 7), and thus I believe, that the internal procedure mentioned in 21.2.2.2 is [[RegExpMatcher]]. Since the internal procedure from 21.2.2.2 (Pattern) matches exactly at specified offset, the 21.2.5.2.2 point 15.b should indeed fail. Warning: this is a minefield. |
So if I'm getting an understanding of the intent of "sticky" (I guess I haven't yet, to this point), it takes a pattern like Is that correct? That seems to imply that "sticky" is useless for repeatedly stepping through matches (unless they're always entirely contiguous/adjacent), as is typically done with Because wouldn't this then fail? var re = /h./y, str = "hello, how are you?"
re.lastIndex; // 0
re.exec( str ); // ["he"]
re.lastIndex; // 2
re.exec( str ); // null If that's how it should work, I'm completely lost on the use cases of it? Also every example (MDN, etc) I've found implies that it's closely related to the repeated |
First, NOTEs are non-normative, which means that they aren't supposed to add anything (except clarification) that isn't in the normative text. The matcher "internal procedure"produced by 21.2.2.2 matches at the offset that is passed to it via its 'index parameter. a different index can be supplied each time it is called at 15.b in 21.2.5.22 but each such call matches only at the index supplied in the call. |
yes
It's an adjustable (via |
I guess what I was asking is, if one is parsing a string (of presumably unknown contents -- otherwise why parse?), how does one know what to adjust In what circumstances do you need sticky when you already know the indexes your matches are at? You can just set the index and use Sorry for being dense here. It's obvious my own misunderstandings are driving all the confusion in this thread. |
@allenwb Thank you for making everything clear.
This note was easiest to reference and it was compatible with my understanding that I have got from pattern and notation of continuation and matcher (the last two are not even numbered and as a result difficult to reference). And the clarification provided by it was really useful when I started diging through it, so its purpose was fullfiled, I think :-) @getify I have found this: |
also, what then is ES6's expected handling of: var re = /^o./y, // <---- notice the ^ here
str = "oops nope";
re.lastIndex; // 0
str.match( re ); // ["oo"]
re.lastIndex; // 2
str.match( re ); // null
re.lastIndex; // 0
re.lastIndex = 6;
str.match( re ); // ???? |
@getify I think that the last result should be null, because input is not starting at position 6. Also take a look at this example, which I believe is correct:
This does not work with /g and repeated exec. |
It seems to work for me (chrome) with I don't see how |
@getify My bad. I ahve missed /m when I have been testing this. I do not know. Maybe the efficiency in some cases is the only advantage. |
see NOTE at end of 21.2.2.6
Probably. I wasn't the feature champion, I just had to figure out how to integrate it into the spec. |
Yikes, I hope not. Let me go back to some of my earliest assertions about the benefits of
I think those are faithful explanations of the difference/benefit of |
Yep, that's exactly the kind of clarity I needed on the |
FWIW, I've completely rewritten (and expanded) my book section on "sticky mode" after all these explorations. If it helps at all, here how I went about it: https://github.com/getify/You-Dont-Know-JS/blob/master/es6%20&%20beyond/ch2.md#sticky-flag |
Closing this now since I've basically solved this on the Babel side and it's now up to a polyfill to patch the |
I did a quick glance at open and past-closed issues and didn't find this mentioned (sorry if I missed it)... would be great to know the plans for support of:
AFAICT, the sticky flag
y
makes the regular expression remember the last match's end and starts from there on subsequent matches (similar to theg
flag withexec(..)
).FYI: I find the MDN page for this to be particularly unhelpful in explaining it, as the example cited there is basically the same as a
g
+exec(..)
example.The text was updated successfully, but these errors were encountered: