Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Inference for Regular Expressions #60249

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c44a057
Type Inference for Regular Expressions
graphemecluster Oct 17, 2024
c435526
Fix Incorrect Disjunction Alternative Visibility
graphemecluster Nov 4, 2024
24c16f9
Add Test Cases for Duplicate Capturing Group Name
graphemecluster Nov 4, 2024
13cfe15
Performance optimization: Reduce type creation
graphemecluster Nov 6, 2024
6aeac05
Fix incorrect type for `/{0?/` etc. in non-Unicode mode
graphemecluster Nov 7, 2024
b828249
Fix: `i` modifier is not removed if no flags are added
graphemecluster Nov 7, 2024
bd74623
Remove more redundant literals esp. `"" | string`
graphemecluster Nov 7, 2024
a6e0b98
Fix: Incorrect characters included for `/\c/` & `/[\c]/` in Annex B
graphemecluster Nov 8, 2024
a31ddcb
Refactor: Rename variables and modify type of reduced union
graphemecluster Nov 8, 2024
42a6568
Refactor: Fast path character classes
graphemecluster Nov 9, 2024
9ce54aa
Refine types for cases where the cross product size is too large
graphemecluster Nov 9, 2024
bb0e808
Refactor: Type `RegularExpressionAnyString` as a unique symbol for be…
graphemecluster Nov 9, 2024
524f291
Expand test cases in `regularExpressionLiteralTypes.ts`
graphemecluster Nov 9, 2024
48f86ff
Fix: missing `"-"` in `/[+-]/`
graphemecluster Nov 9, 2024
3ddc5da
Mark `RegExpDigits` as character class (for fast path)
graphemecluster Nov 9, 2024
9dc953c
Correct lib types & Refine type checking test case
graphemecluster Nov 10, 2024
0cf763d
Separate type checking test case into 2 files which tests for differe…
graphemecluster Nov 10, 2024
53425e5
Add test case for `String#matchAll`
graphemecluster Nov 10, 2024
12e875b
Collapse consecutive string types in template literals
graphemecluster Nov 10, 2024
5dfdb56
Merge remote-tracking branch 'upstream/main' into regex-type-infer
graphemecluster Nov 10, 2024
09ad9c3
Fix up all baselines
graphemecluster Nov 10, 2024
61fc428
Fix up all self check errors
graphemecluster Nov 10, 2024
402760c
Separate the type checking test case further
graphemecluster Nov 10, 2024
99355c0
Temporarily exclude `RegExp` & `RegExpExecArray` from `checkNoTypeArg…
graphemecluster Nov 10, 2024
80dd5d0
Revert "Temporarily exclude `RegExp` & `RegExpExecArray` from `checkN…
graphemecluster Nov 10, 2024
b251992
Fix self check in a more ugly way due to build issue
graphemecluster Nov 10, 2024
5bd7951
Temporarily slience a few `no-unnecessary-type-assertion` lint lines …
graphemecluster Nov 10, 2024
037ef8e
Fix minor incorrect fix in self check
graphemecluster Nov 10, 2024
e51e144
Temporarily shrink type checking test case due to timeout
graphemecluster Nov 10, 2024
2989429
Inline `RegularExpressionAnyString` to avoid `export` keywords in `ty…
graphemecluster Nov 12, 2024
8cc6840
Keep `RegularExpressionAnyString` but export as internal
graphemecluster Nov 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
251 changes: 226 additions & 25 deletions src/compiler/checker.ts

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions src/compiler/core.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1123,6 +1123,12 @@ export function last<T>(array: readonly T[]): T {
return array[array.length - 1];
}

/** @internal */
export function setLast<T>(array: T[], value: T): T {
Debug.assert(array.length !== 0);
return array[array.length - 1] = value;
}

/**
* Returns the only element of an array if it contains only one element, `undefined` otherwise.
*
Expand Down
8 changes: 4 additions & 4 deletions src/compiler/diagnosticMessages.json
Original file line number Diff line number Diff line change
Expand Up @@ -1705,10 +1705,6 @@
"category": "Error",
"code": 1512
},
"Undetermined character escape.": {
"category": "Error",
"code": 1513
},
"Expected a capturing group name.": {
"category": "Error",
"code": 1514
Expand Down Expand Up @@ -1834,6 +1830,10 @@
"category": "Error",
"code": 1544
},
"'\\k' is only available outside character class.": {
"category": "Error",
"code": 1545
},
Comment on lines +1833 to +1836
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, a new message can’t take the place of an old, unused one right?
(TS1513 never actually appears as TS1161 would have been emitted at the same position if a RegExp is unterminated, so replacing the message shouldn’t cause any problems.)


"The types of '{0}' are incompatible between these types.": {
"category": "Error",
Expand Down
12 changes: 8 additions & 4 deletions src/compiler/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10689,9 +10689,11 @@ function extractPragmas(pragmas: PragmaPseudoMapEntry[], range: CommentRange, te
return; // Missing required argument, don't parse
}
else if (matchResult) {
const value = matchResult[2] || matchResult[3];
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
const value = (matchResult[2] || matchResult[3])!;
if (arg.captureSpan) {
const startPos = range.pos + matchResult.index + matchResult[1].length + 1;
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
const startPos = range.pos + matchResult.index + matchResult[1]!.length + 1;
argument[arg.name] = {
value,
pos: startPos,
Expand Down Expand Up @@ -10720,14 +10722,16 @@ function extractPragmas(pragmas: PragmaPseudoMapEntry[], range: CommentRange, te
const multiLinePragmaRegEx = /@(\S+)(\s+(?:\S.*)?)?$/gm; // Defined inline since it uses the "g" flag, which keeps a persistent index (for iterating)
let multiLineMatch: RegExpExecArray | null; // eslint-disable-line no-restricted-syntax
while (multiLineMatch = multiLinePragmaRegEx.exec(text)) {
addPragmaForMatch(pragmas, range, PragmaKindFlags.MultiLine, multiLineMatch);
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
addPragmaForMatch(pragmas, range, PragmaKindFlags.MultiLine, multiLineMatch!);
}
}
}

function addPragmaForMatch(pragmas: PragmaPseudoMapEntry[], range: CommentRange, kind: PragmaKindFlags, match: RegExpExecArray) {
if (!match) return;
const name = match[1].toLowerCase() as keyof PragmaPseudoMap; // Technically unsafe cast, but we do it so they below check to make it safe typechecks
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
const name = match[1]!.toLowerCase() as keyof PragmaPseudoMap; // Technically unsafe cast, but we do it so they below check to make it safe typechecks
const pragma = commentPragmas[name] as PragmaDefinition;
if (!pragma || !(pragma.kind! & kind)) {
return;
Expand Down
683 changes: 497 additions & 186 deletions src/compiler/scanner.ts

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion src/compiler/semver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ function parseHyphen(left: string, right: string, comparators: Comparator[]) {
return true;
}

function parseComparator(operator: string, text: string, comparators: Comparator[]) {
function parseComparator(operator: string | undefined, text: string, comparators: Comparator[]) {
const result = parsePartial(text);
if (!result) return false;

Expand Down
3 changes: 2 additions & 1 deletion src/compiler/sourcemap.ts
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,8 @@ export function tryGetSourceMappingURL(lineInfo: LineInfo): string | undefined {
const line = lineInfo.getLineText(index);
const comment = sourceMapCommentRegExp.exec(line);
if (comment) {
return comment[1].trimEnd();
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
return comment[1]!.trimEnd();
}
// If we see a non-whitespace/map comment-like line, break, to avoid scanning up the entire file
else if (!line.match(whitespaceOrMapCommentRegExp)) {
Expand Down
4 changes: 2 additions & 2 deletions src/compiler/transformers/jsx.ts
Original file line number Diff line number Diff line change
Expand Up @@ -619,15 +619,15 @@ export function transformJsx(context: TransformationContext): (x: SourceFile | B
* See https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
*/
function decodeEntities(text: string): string {
return text.replace(/&((#((\d+)|x([\da-fA-F]+)))|(\w+));/g, (match, _all, _number, _digits, decimal, hex, word) => {
return text.replace(/&(?:#(?:(\d+)|x([\da-fA-F]+))|(\w+));/g, (match, decimal, hex, word) => {
if (decimal) {
return utf16EncodeAsString(parseInt(decimal, 10));
}
else if (hex) {
return utf16EncodeAsString(parseInt(hex, 16));
}
else {
const ch = entities.get(word);
const ch = entities.get(word || "");
// If this is not a valid entity, then just use `match` (replace it with itself, i.e. don't replace)
return ch ? utf16EncodeAsString(ch) : match;
}
Expand Down
47 changes: 47 additions & 0 deletions src/compiler/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
PackageJsonInfo,
PackageJsonInfoCache,
Pattern,
RegExpAnyString,
SymlinkCache,
ThisContainer,
} from "./_namespaces/ts.js";
Expand Down Expand Up @@ -2801,6 +2802,52 @@ export const enum RegularExpressionFlags {
Modifiers = IgnoreCase | Multiline | DotAll,
}

/** @internal */
export interface RegularExpressionDisjunction {
patternUnion?: RegularExpressionPatternUnion;
groupNumber?: number;
groupName?: string;
isInNegativeAssertion?: boolean;
}

/** @internal */
export interface RegularExpressionDisjunctionsScope extends Array<RegularExpressionDisjunction> {
/** All disjunctions after this index are the ones need to be considered */
currentAlternativeIndex: number;
}

/** @internal */
export type RegularExpressionAnyString = typeof RegExpAnyString;

/** @internal */
export type RegularExpressionPatternContent = string | RegularExpressionAnyString | RegularExpressionPatternUnion;

/** @internal */
export interface RegularExpressionPattern extends Array<RegularExpressionPatternContent> {
_regularExpressionPatternBrand: any;
}

/** @internal */
export interface RegularExpressionPatternUnion extends Set<string | RegularExpressionPattern> {
_regularExpressionPatternUnionBrand: any;
isPossiblyUndefined?: boolean;
isCharacterClass?: boolean; // For fast path in the checker
isCharacterEquivalents?: boolean;
}

/** @internal */
export type RegularExpressionReducedContent = string | RegularExpressionAnyString | RegularExpressionReducedUnion | RegularExpressionReducedPattern;

/** @internal */
export interface RegularExpressionReducedUnion extends Set<string | RegularExpressionReducedPattern> {
_regularExpressionReducedUnionBrand: any;
}

/** @internal */
export interface RegularExpressionReducedPattern extends Array<string | RegularExpressionAnyString> {
_regularExpressionReducedPatternBrand: any;
}

export interface NoSubstitutionTemplateLiteral extends LiteralExpression, TemplateLiteralLikeNode, Declaration {
readonly kind: SyntaxKind.NoSubstitutionTemplateLiteral;
/** @internal */
Expand Down
14 changes: 7 additions & 7 deletions src/compiler/utilities.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6063,10 +6063,10 @@ export function hasInvalidEscape(template: TemplateLiteral): boolean {
// the language service. These characters should be escaped when printing, and if any characters are added,
// the map below must be updated. Note that this regexp *does not* include the 'delete' character.
// There is no reason for this other than that JSON.stringify does not handle it either.
const doubleQuoteEscapedCharsRegExp = /[\\"\u0000-\u001f\u2028\u2029\u0085]/g;
const singleQuoteEscapedCharsRegExp = /[\\'\u0000-\u001f\u2028\u2029\u0085]/g;
const doubleQuoteEscapedCharsRegExp: RegExp = /[\\"\u0000-\u001f\u2028\u2029\u0085]/g;
const singleQuoteEscapedCharsRegExp: RegExp = /[\\'\u0000-\u001f\u2028\u2029\u0085]/g;
// Template strings preserve simple LF newlines, still encode CRLF (or CR)
const backtickQuoteEscapedCharsRegExp = /\r\n|[\\`\u0000-\u0009\u000b-\u001f\u2028\u2029\u0085]/g;
const backtickQuoteEscapedCharsRegExp: RegExp = /\r\n|[\\`\u0000-\u0009\u000b-\u001f\u2028\u2029\u0085]/g;
const escapedCharsMap = new Map(Object.entries({
"\t": "\\t",
"\v": "\\v",
Expand Down Expand Up @@ -6114,7 +6114,7 @@ export function escapeString(s: string, quoteChar?: CharacterCodes.doubleQuote |
const escapedCharsRegExp = quoteChar === CharacterCodes.backtick ? backtickQuoteEscapedCharsRegExp :
quoteChar === CharacterCodes.singleQuote ? singleQuoteEscapedCharsRegExp :
doubleQuoteEscapedCharsRegExp;
return s.replace(escapedCharsRegExp, getReplacement);
return s.replace(escapedCharsRegExp, getReplacement as (...args: any[]) => string);
}

const nonAsciiCharacters = /[^\u0000-\u007F]/g;
Expand All @@ -6132,8 +6132,8 @@ export function escapeNonAsciiString(s: string, quoteChar?: CharacterCodes.doubl
// paragraphSeparator, and nextLine. The latter three are just desirable to suppress new lines in
// the language service. These characters should be escaped when printing, and if any characters are added,
// the map below must be updated.
const jsxDoubleQuoteEscapedCharsRegExp = /["\u0000-\u001f\u2028\u2029\u0085]/g;
const jsxSingleQuoteEscapedCharsRegExp = /['\u0000-\u001f\u2028\u2029\u0085]/g;
const jsxDoubleQuoteEscapedCharsRegExp: RegExp = /["\u0000-\u001f\u2028\u2029\u0085]/g;
const jsxSingleQuoteEscapedCharsRegExp: RegExp = /['\u0000-\u001f\u2028\u2029\u0085]/g;
const jsxEscapedCharsMap = new Map(Object.entries({
'"': "&quot;",
"'": "&apos;",
Expand All @@ -6155,7 +6155,7 @@ function getJsxAttributeStringReplacement(c: string) {
export function escapeJsxAttributeString(s: string, quoteChar?: CharacterCodes.doubleQuote | CharacterCodes.singleQuote): string {
const escapedCharsRegExp = quoteChar === CharacterCodes.singleQuote ? jsxSingleQuoteEscapedCharsRegExp :
jsxDoubleQuoteEscapedCharsRegExp;
return s.replace(escapedCharsRegExp, getJsxAttributeStringReplacement);
return s.replace(escapedCharsRegExp, getJsxAttributeStringReplacement as (...args: any[]) => string);
}

/**
Expand Down
13 changes: 11 additions & 2 deletions src/harness/fourslashInterfaceImpl.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1224,6 +1224,10 @@ export namespace Completion {
interfaceEntry("CallableFunction"),
interfaceEntry("NewableFunction"),
interfaceEntry("IArguments"),
interfaceEntry("StringReplaceCallbackOptions"),
interfaceEntry("StringReplaceCallbackIncludeNamedCapturingGroups"),
typeEntry("StringReplaceCallbackSignature"),
typeEntry("RegExpMatchArray"),
varEntry("String"),
interfaceEntry("StringConstructor"),
varEntry("Boolean"),
Expand All @@ -1238,9 +1242,14 @@ export namespace Completion {
varEntry("Math"),
varEntry("Date"),
interfaceEntry("DateConstructor"),
interfaceEntry("RegExpMatchArray"),
interfaceEntry("RegExpExecArray"),
typeEntry("CapturingGroupsArray"),
typeEntry("NamedCapturingGroupsObject"),
typeEntry("RegExpExecArray"),
interfaceEntry("_RegExpExecArray"), // XXX This shouldn't be included
interfaceEntry("RegExpIndices"),
varEntry("RegExp"),
interfaceEntry("_RegExp"), // XXX This shouldn't be included
interfaceEntry("RegExpFlags"),
interfaceEntry("RegExpConstructor"),
varEntry("Error"),
interfaceEntry("ErrorConstructor"),
Expand Down
13 changes: 8 additions & 5 deletions src/harness/harnessIO.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1290,7 +1290,8 @@ export namespace TestCaseParser {

let match: RegExpExecArray | null; // eslint-disable-line no-restricted-syntax
while ((match = optionRegex.exec(content)) !== null) { // eslint-disable-line no-restricted-syntax
opts[match[1]] = match[2].trim();
// eslint-disable-next-line @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249
opts[match[1]!] = match[2]!.trim();
}

return opts;
Expand Down Expand Up @@ -1327,8 +1328,9 @@ export namespace TestCaseParser {
else if (testMetaData = optionRegex.exec(line)) {
// Comment line, check for global/file @options and record them
optionRegex.lastIndex = 0;
const metaDataName = testMetaData[1].toLowerCase();
currentFileOptions[testMetaData[1]] = testMetaData[2].trim();
/* eslint-disable @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249 */
const metaDataName = testMetaData[1]!.toLowerCase();
currentFileOptions[testMetaData[1]!] = testMetaData[2]!.trim();
if (metaDataName !== "filename") {
continue;
}
Expand All @@ -1348,12 +1350,13 @@ export namespace TestCaseParser {
// Reset local data
currentFileContent = undefined;
currentFileOptions = {};
currentFileName = testMetaData[2].trim();
currentFileName = testMetaData[2]!.trim();
refs = [];
}
else {
// First metadata marker in the file
currentFileName = testMetaData[2].trim();
currentFileName = testMetaData[2]!.trim();
/* eslint-enable @typescript-eslint/no-unnecessary-type-assertion -- TODO: Remove this line after #60249 */
if (currentFileContent && ts.skipTrivia(currentFileContent, 0, /*stopAfterLineBreak*/ false, /*stopAtComments*/ false) !== currentFileContent.length) {
throw new Error("Non-comment test content appears before the first '// @Filename' directive");
}
Expand Down
2 changes: 1 addition & 1 deletion src/harness/util.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ function createDiagnosticMessageReplacer<R extends (messageArgs: string[], ...ar
const messageParts = diagnosticMessage.message.split(/\{\d+\}/);
const regExp = new RegExp(`^(?:${messageParts.map(ts.regExpEscape).join("(.*?)")})$`);
type Args<R> = R extends (messageArgs: string[], ...args: infer A) => string[] ? A : [];
return (text: string, ...args: Args<R>) => text.replace(regExp, (_, ...fixedArgs) => ts.formatStringFromArgs(diagnosticMessage.message, replacer(fixedArgs, ...args)));
return (text: string, ...args: Args<R>) => text.replace(regExp, (_, ...fixedArgs) => ts.formatStringFromArgs(diagnosticMessage.message, replacer(fixedArgs as string[], ...args)));
}

const replaceTypesVersionsMessage = createDiagnosticMessageReplacer(
Expand Down
Loading