Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex Validated String Types #21044

Closed
wants to merge 2 commits into from

Conversation

weswigham
Copy link
Member

This PR implements a subset of the proposals discussed in #6579.

Regular Expression Validated String Literal Types

These types enable users of regular expressions to receive automatic tracking of refinements on string types through type guards on strongly typed RegExps. These refinements can compose via | and & like one would expect, and allow a user to limit the strings they take or return to a specific subdomain of strings, without listing all the members explicitly (which may even be impossible).
It looks like this:

type SimplePhoneNumber = /^[0-9]{3}-[0-9]{3}-[0-9]{4}$/;
// The type in comments below is what the type of the regex actually is inferred as!
const SimplePhoneNumber /*: RegExp<SimplePhoneNumber>*/ = /^[0-9]{3}-[0-9]{3}-[0-9]{4}$/;
type FirstLastName = /^\w+\W+\w+$/i;
const FirstLastName = /^\w+\W+\w+$/i;
interface Contact {
    phone: SimplePhoneNumber;
    name: FirstLastName;
}

class AddressBook {
    entries: Contact[] = [];
    add(phone: string, name: string) {
        if (!SimplePhoneNumber.test(phone)) {
            throw new Error("Invalid phone number!");
        }
        if (!FirstLastName.test(name)) {
            throw new Error("Invalid name!");
        }
        this.addSafe(phone, name);
    }
    addSafe(phone: SimplePhoneNumber, name: FirstLastName) {
        this.entries.push({ phone, name });
    }
    get(phone: SimplePhoneNumber): Contact {
        // ...
        return null as any;
    }
    // ...
}

const book = new AddressBook();
book.add("555-555-5555", "Some Person");
book.addSafe("555-555-5556", "Some Person2"); // Interpreted as literals
const contact = book.get("555-555-555");
book.addSafe(contact.name, contact.phone); // Error on `contact.name`! Confused argument order! :D

Syntax

A js regular expression in a type position is a regular expression validated string type.
The type of a js regular expression elsewhere is generic over the regular expression type it validates. (eg var a = /a/ has type RegExp</a/>)
The .test method of RegExp is a typeguard (test(s: string): s is T).

Assignability

A regular expression validated string type is StringLike. Two regular expression validated
string types are equal only if they are textually equivalent. Any string literal type which matches
the given regexp is assignable to the type. If the regular expression cannot be executed (eg, it references
any flags other than i or m, or errors on construction), no string literal types are directly assignable to it, and must be cast. I've included a quickfix for adding such a cast where it makes sense - it might not always be correct to cast (since we're not checking the literal's contents), but it should make it easy for the user to add the cast when it is needed.

🚲 🏠 Should we refuse to execute regexes with more features we don't like (due to portability or likelihood to misuse)? (similarly to how flags other than `i` or `m` currently cause the regex not to be executed?) Like lookbehind, etc? Requires more work to explicitly prohibit or allow only certain classes of regexp to be executable (namely actually parsing the regex, something we currently refrain from doing), for only a small gain in perceived consistency - the current runtime reliance will only be noticeable when a user moves their project between seriously different compiler runtimes and has non-portable regex syntax.

Examples

const isA = /a/i;
let mustBeA: /a/i;
declare var s: string;
if (isA.test(s)) {
    mustBeA = s;
}

const isB = /b/i;
let mustBeB: /b/i;
if (isB.test(s)) {
    mustBeB = s;
}

let mustBeBOrA: /b/i | /a/i;
if (isB.test(s) || isA.test(s)) {
    mustBeBOrA = s;
}

let mustBeBAndA: /b/i & /a/i;
if (isB.test(s) && isA.test(s)) {
    mustBeBOrA = s;
}

mustBeB = "b";
mustBeB = "B";
mustBeA = "a";
mustBeA = "A";

mustBeBOrA = "a";
mustBeBOrA = "b";

Future Work

Support index signatures containing arbitrary stringlike types, to allow for index signatures of regex validated string types, eg:

type IndexableProps = {
  [prop: /^data\-*/]: string;
  [prop: /^aria\-*/]: string;
  [prop: string]: any;
}

Complementary Proposals

  • unique keyword on any type
    • Would allow for truly nominal regexes - to emulate that now, you would tag your regexes, ie type Zip = /^[0-9]{5}$(Zip)?/; type Digits = /^[0-9]{5}$(Digits)?/ - these match the same language, but are textually different, so become distinct types.

Fixes #6579

@bterlson
Copy link
Member

bterlson commented Jan 8, 2018

This is really neat, and a feature I will likely make heavy use of. But one aspect concerns me a bit, which is that I have to type my regexps twice. As regexps are essentially grawlix, this isn't necessarily the most trivial thing. Also having to keep the RegExps in sync when making changes is error prone.

If there were some way to only create a RegExp once, say its value in normal JS, and get the corresponding regexp validated string type, I would be able to have a single source of truth for my regexps and it also might make it possible to use this feature with RegExps from validation libraries or the like.

@weswigham
Copy link
Member Author

Right now the generic parameter for the RegExp interface is only used by the test function, but you can easily patch it with a dummy member to allow you to extract the generic parameter, with something like so:

declare global {
  interface RegExp<T extends string = string> {
    readonly " __dummyValidatedType": T;
  }
}

and then reference the type on any instance similarly to:

const SimplePhoneNumber = /^[0-9]{3}-[0-9]{3}-[0-9]{4}$/;
type SimplePhoneNumber = (typeof SimplePhoneNumber)[" __dummyValidatedType"];

AFAIK, there's no real instance field that'd exist at runtime on a RegExp you can attach the validating generic type to (just the result of calling methods, whose types are not easily extracted), which is why I don't think we'd incorporate a "hack" field like this into the normal lib, even if it is convenient. However so long as we use generics to implement this, such a structure will work.

@gcnew
Copy link
Contributor

gcnew commented Jan 21, 2018

This feature will be very useful for type-level programming. Combined with Conditional types, it will enable filtering of object properties by pattern.

@philkunz
Copy link

whats the status on this one?

@miccarr
Copy link

miccarr commented May 2, 2018

What a great feature! It would prevent a lot of errors about of UUIDs, Hexas, etc.
Nothing new since the PR?

@Mouvedia
Copy link

https://stackoverflow.com/q/43677527/248058

@mpawelski
Copy link

This is so promising feature! With conditional types it looks even more powerful.

//helper type to extrack "regex type" from RegExp<T>
type RegExpType<T> = T extends RegExp<infer U> ? U : never;
const SimplePhoneNumber  = /^[0-9]{3}-[0-9]{3}-[0-9]{4}$/;
type SimplePhoneNumber = RegExpType<typeof SimplePhoneNumber>

//some type we want to filer properties for
interface SomeType {
  a: number;
  b: string;
  _privateProp: string;
  _otherPrivateProp: { foo: string };
}
type FilteredSomeType1 = Pick<SomeType, Exclude<keyof SomeType, "_privateProp" | "_otherPrivateProp">>; //aleady works in TS 2.8
type FilteredSomeType2 = Pick<SomeType, Exclude<keyof SomeType, /_.*/>>; //Hope it will work in future 😍

@RyanCavanaugh
Copy link
Member

Closing since #6579 isn't looking too likely in the near-term

@weswigham
Copy link
Member Author

I have a branch up that merges this with #26797, which allows us to explore regex index signatures (which, when we discussed this, were one of the most useful things we thought we'd see them used for).

image

@ghost
Copy link

ghost commented Jun 9, 2021

Per conversation following #41160 (comment) could the syntax be improved with a explicit type cast?
E.g. matchof /^(aria|data)(\-[\w\d]+)+/ is more intuitively read as a type by a JS-only user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: Regex-validated string type
8 participants