-
-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(coverage): enable regexp in test262 #4242
Conversation
let span = self.start_span(); | ||
|
||
// split out pattern | ||
let (pattern_end, flags) = self.read_regex(); | ||
let pattern_start = self.cur_token().start + 1; // +1 to exclude `/` | ||
let pattern = &self.source_text[pattern_start as usize..pattern_end as usize]; | ||
if let Err(diagnostic) = PatternParser::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reparsing each regex
eagerly in place sounds not reasonable, can we introduce another visit pass (enabled by a option) and reparse each regex then emit diagnostic at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most downstream users of parser
may not care about whether the regex semantic is correct or not like formater, bundler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually, I'm also concerned about how to finish this PR.
With current code,
oxc_parser
parsesRegExp
and reports errors, but does not seem to hold the parsed results- RegExp literals(
/abc/
) are checked, but RegExp object calls(new RegExp("abc")
) are not checked
@Boshen What were your original plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about it a little, and to organize my thoughts..., I'll answer my own questions.
RegExp literals(/abc/) are checked, but RegExp object calls(new RegExp("abc")) are not checked
It's OK.
In the case of new RegExp("string")
, the code is just parsed, and if there’s an error in RegExp, it will occur at runtime.
On the other hand, in the case of /string/
, the syntax must satisfy the requirements of a literal, so it should produce an error during the parsing stage, before runtime.
console.log("START");
const a = new RegExp("a{", "u"); // <- Invalid
console.log("END");
This will log "START".
console.log("START");
const a = /a{/u; // <- Invalid
console.log("END");
This, however, will not log anything.
One thing that concerns me, though, is that if invalid literal is treated as error at the parser stage, it would make it impossible to implement rules like eslint/no-invalid-regexp
?
How to reuse parsed result
Perhaps parse it again from Semantic, just like with JSDoc...?
To make the #4242 tests pass. (My `RegExp` parser tells me `/as)df/` is invalid syntax. 😂)
FYI:
Benchmark results have been updated now. I don't think current approach is not the best solution, but the CI is green. 😅 |
Thanks for your explanation |
a6325ee
to
368364d
Compare
Part of #1164 ## Progress updates 🗞️ Waiting for the review and advice, while thinking how to handle escaped string when `new RegExp(pat)`. ## TODOs - [x] `RegExp(Literal = Body + Flags)#parse()` structure - [x] Base `Reader` impl to handle both unicode(u32) and utf-16(u16) units - [x] Global `Span` and local offset conversion - [x] Design AST shapes - [x] Keep `enum` size small by `Box<'a, T>` - [x] Rework AST shapes - [x] Split body and flags w/ validating literal - [x] Parse `RegExpFlags` - [x] Parse `RegExpBody` = `Pattern` - [x] Parse `Pattern` > `Disjunction` - [x] Parse `Disjunction` > `Alternative` - [x] Parse `Alternative` > `Term` - [x] Parse `Term` > `Assertion` - [x] Parse `BoundaryAssertion` - [x] Parse `LookaroundAssertion` - [x] Parse `Term` > `Quantifier` - [x] Parse `Term` > `Atom` - [x] Parse `Atom` > `PatternCharacter` - [x] Parse `Atom` > `.` - [x] Parse `Atom` > `\AtomEscape` - [x] Parse `\AtomEscape` > `DecimalEscape` - [x] Parse `\AtomEscape` > `CharacterClassEscape` - [x] Parse `CharacterClassEscape` > `\d, \D, \s, \S, \w, \W` - [x] Parse `CharacterClassEscape` > `\p{UnicodePropertyValueExpression}, \P{UnicodePropertyValueExpression}` - [x] Parse `\AtomEscape` > `CharacterEscape` - [x] Parse `CharacterEscape` > `ControlEscape` - [x] Parse `CharacterEscape` > `c AsciiLetter` - [x] Parse `CharacterEscape` > `0` - [x] Parse `CharacterEscape` > `HexEscapeSequence` - [x] Parse `CharacterEscape` > `RegExpUnicodeEscapeSequence` - [x] Parse `CharacterEscape` > `IdentityEscape` - [x] Parse `\AtomEscape` > `kGroupName` - [x] Parse `Atom` > `[CharacterClass]` - [x] Parse `[CharacterClass]` > `ClassContents` > `[~UnicodeSetsMode] NonemptyClassRanges` - [x] Parse `[CharacterClass]` > `ClassContents` > `[+UnicodeSetsMode] ClassSetExpression` - [x] Parse `ClassSetExpression` > `ClassUnion` - [x] Parse `ClassSetExpression` > `ClassIntersection` - [x] Parse `ClassSetExpression` > `ClassSubtraction` - [x] Parse `ClassSetExpression` > `ClassSetOperand` - [x] Parse `ClassSetExpression` > `ClassSetRange` - [x] Parse `ClassSetExpression` > `ClassSetCharacter` - [x] Parse `Atom` > `(GroupSpecifier)` - [x] Parse `Atom` > `(?:Disjunction)` - [x] Annex B - [x] Parse `QuantifiableAssertion` - [x] Parse `ExtendedAtom` - [x] Parse `ExtendedAtom` > `\ [lookahead = c]` - [x] Parse `ExtendedAtom` > `InvalidBracedQuantifier` - [x] Parse `ExtendedAtom` > `ExtendedPatternCharacter` - [x] Parse `ExtendedAtom` > `\AtomEscape` > `CharacterEscape` > `LegacyOctalEscapeSequence` - [x] Early errors - [x] Pattern :: Disjunction(1/2) - [x] Pattern :: Disjunction(2/2) - [x] QuantifierPrefix :: { DecimalDigits , DecimalDigits } - [x] ExtendedAtom :: InvalidBracedQuantifier (Annex B) - [x] AtomEscape :: k GroupName - [x] AtomEscape :: DecimalEscape - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(1/2) - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(2/2) - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(Annex B) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(1/2) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(2/2) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(Annex B) - [x] RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence - [x] RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate - [x] RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence - [x] RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate - [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(1/2) - [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(2/2) - [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(1/2) - [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(2/2) - [x] CharacterClassEscape :: P{ UnicodePropertyValueExpression } - [x] CharacterClass :: [^ ClassContents ] - [x] NestedClass :: [^ ClassContents ] - [x] ClassSetRange :: ClassSetCharacter - ClassSetCharacter - [x] Add `Span` to `Err(OxcDiagnostic::error())` calls - [x] Perf improvement - [x] `Reader#peek()` should avoid `iter.next()` equivalent - [x] ~~Use `char` everywhere and split and push 2 surrogates(pair) for `Character`?~~ - [x] ~~Try 1(+1) loop parsing for capturing groups?~~ ## Follow up - [x] @Boshen Test suite > #4242 - [x] Investigate CI errors... - Next... - Support ES2025 Duplicate named capturing groups? - Support ES20XX Stage3 Modifiers?
Continue in #4998 |
@leaysgur This enables all test262 regexp tests, feel free to decide when's the right time to integrate.
It seems like we need to add some pointing spans on the diagnostics.
It's somewhat slow to run
just c
, so I always usejust example parser
for local development.