-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Regex match check #3779
Conversation
@raskad This is a pretty big optimization because it removes O(n^2) complexity from our match implementation, but we are not sure how to adapt this to be able to work on unicode matching. Can you give this a look? |
Test262 conformance changes
|
Reduced the failing tests from 48 to 4 unicode ones. |
After some more debugging I think it's an issue with regresses /// 262 test/built-ins/RegExp/prototype/exec/u-lastindex-adv.js
///
/// Test case:
///
/// ```JavaScript
/// assert.sameValue(/\udf06/u.exec('\ud834\udf06'), null);
/// ```
#[test]
fn utf16_correct_unicode_scan() {
// '𝌆' This is "Tetragram For Centre"
// See: https://www.compart.com/en/unicode/U+1D306
const INPUT: &[u16] = &[0xd834, 0xdf06];
const MATCHER: &[u16] = &[0xdf06];
let regex = Regex::from_unicode(MATCHER.iter().copied().map(u32::from), Flags::from("u"))
.expect("valid regex");
let m = regex.find_from_utf16(INPUT, 0).next();
println!("{m:#?}");
assert!(m.is_none());
} There is a match:
|
The tests seem fine with the new regress version so I think the only thing missing is the todo comment |
I put the comments this is ready for review/merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find and optimization!
No description provided.