-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
regex_search() fails to reset the capturing group state correctly between match attempts. Because of this, it might claim that some capturing groups are matched even though they shouldn't be.
Test case
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
smatch captures;
regex re("a|(b)c");
string input("ba");
auto result = regex_search(input, captures, re);
cout << "search succeeded: " << result << '\n';
if (result) {
cout << "matched character sequence: " << captures[0].str() << '\n';
cout << "capturing group 1 matched: " << captures[1].matched << '\n';
cout << "contents of capturing group 1: " << captures[1].str();
}
return 0;
}
Godbolt link: https://godbolt.org/z/Wsb6aEjTf
This program produces the following output:
search succeeded: 1
matched character sequence: a
capturing group 1 matched: 1
contents of capturing group 1: b
Expected behavior
The program should produce the following output:
search succeeded: 1
matched character sequence: a
capturing group 1 matched: 0
contents of capturing group 1:
STL version
This bug appears to have been introduced in MSVC Build Tools 19.50 and still reproduces on current head.
Additional context
The setup code for the matcher in _Matcher(x)::_Match() has never contained explicit code to reset the capturing group state. But until recently, the matcher tried to reset captures when it encountered an _N_capture node:
Lines 3605 to 3608 in 713dd95
| // CodeQL [SM02323] Comparing unchanging unsigned int _Node->_Idx to decreasing size_t _Idx is safe. | |
| for (size_t _Idx = _Tgt_state._Grp_valid.size(); _Node->_Idx < _Idx;) { | |
| _Tgt_state._Grp_valid[--_Idx] = false; | |
| } |
But this loop was an inadequate attempt to implement ECMAScript's capturing group reset rule and a major source of bugs in this area. #5456 applied changes to reset capturing groups according to the ECMAScript standard, and one change was to remove this loop. But this had the subtle consequence that capturing groups were no longer reset when the _N_capture node for group 0 was encountered, so the matched capturing groups from a prior failed match attempt could now spill over into the following matches.