Skip to content

feat: characterClass & characterRange #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# TS Regex Builder

User-friendly egular Expression builder for TypeScript and JavaScript.
User-friendly Regular Expression builder for TypeScript and JavaScript.

## The problem & solution
## Goal

Regular expressions are a powerful tool for matching complex text patterns, yet they are notorious for their hard-to-understand syntax.

Expand All @@ -13,15 +13,19 @@ Inspired by Swift's Regex Builder, this library allows users to write easily and
const hexColor = /^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/;

// After
const hexDigit = characterClass(['a', 'f'], ['A', 'F'], ['0', '9']);
const hexDigit = characterClass(
characterRange('a', 'f'),
characterRange('A', 'F'),
characterRange('0', '9')
);

const hexColor = buildRegex(
startOfString,
'#',
choiceOf(
repeat({ count: 6 }, hexDigit),
repeat({ count: 3 }, hexDigit),
optionally('#'),
capture(
choiceOf(repeat({ count: 6 }, hexDigit), repeat({ count: 3 }, hexDigit))
),
endOfString,
endOfString
);
```

Expand All @@ -34,10 +38,10 @@ npm install ts-regex-builder
## Usage

```js
import { buildRegex, oneOrMore } from 'ts-regex-builder';
import { buildRegex, capture, oneOrMore } from 'ts-regex-builder';

// /(Hello)+ World/
const regex = buildRegex(oneOrMore('Hello'), ' World');
// /Hello (\w+)/
const regex = buildRegex('Hello ', capture(oneOrMore(word)));
```

## Contributing
Expand Down
46 changes: 44 additions & 2 deletions src/components/__tests__/character-class.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ import { oneOrMore, optionally, zeroOrMore } from '../quantifiers';
import {
any,
anyOf,
characterClass,
characterRange,
digit,
encodeCharacterClass,
inverted,
Expand Down Expand Up @@ -33,6 +35,45 @@ test('`whitespace` character class', () => {
expect(['x', whitespace, 'x']).toHavePattern('x\\sx');
});

test('`characterClass` base cases', () => {
expect(characterClass(characterRange('a', 'z'))).toHavePattern('[a-z]');
expect(
characterClass(characterRange('a', 'z'), characterRange('A', 'Z'))
).toHavePattern('[a-zA-Z]');
expect(characterClass(characterRange('a', 'z'), anyOf('05'))).toHavePattern(
'[a-z05]'
);
expect(
characterClass(characterRange('a', 'z'), whitespace, anyOf('05'))
).toHavePattern('[a-z\\s05]');
});

test('`characterClass` throws on inverted arguments', () => {
expect(() =>
characterClass(inverted(whitespace))
).toThrowErrorMatchingInlineSnapshot(
`"\`characterClass\` should receive only non-inverted character classes"`
);
});

test('`characterRange` base cases', () => {
expect(characterRange('a', 'z')).toHavePattern('[a-z]');
expect(['x', characterRange('0', '9')]).toHavePattern('x[0-9]');
expect([characterRange('A', 'F'), 'x']).toHavePattern('[A-F]x');
});

test('`characterRange` throws on incorrect arguments', () => {
expect(() => characterRange('z', 'a')).toThrowErrorMatchingInlineSnapshot(
`"\`start\` should be less or equal to \`end\`"`
);
expect(() => characterRange('aa', 'z')).toThrowErrorMatchingInlineSnapshot(
`"\`characterRange\` should receive only single character \`start\` string"`
);
expect(() => characterRange('a', 'zz')).toThrowErrorMatchingInlineSnapshot(
`"\`characterRange\` should receive only single character \`end\` string"`
);
});

test('`anyOf` base cases', () => {
expect(anyOf('a')).toHavePattern('a');
expect(['x', anyOf('a'), 'x']).toHavePattern('xax');
Expand Down Expand Up @@ -81,9 +122,10 @@ test('`encodeCharacterClass` throws on empty text', () => {
encodeCharacterClass({
type: 'characterClass',
characters: [],
inverted: false,
ranges: [],
isInverted: false,
})
).toThrowErrorMatchingInlineSnapshot(
`"Character class should contain at least one character"`
`"Character class should contain at least one character or character range"`
);
});
116 changes: 88 additions & 28 deletions src/components/character-class.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,78 @@ import type { CharacterClass } from './types';
export const any: CharacterClass = {
type: 'characterClass',
characters: ['.'],
inverted: false,
ranges: [],
isInverted: false,
};

export const digit: CharacterClass = {
type: 'characterClass',
characters: ['\\d'],
inverted: false,
ranges: [],
isInverted: false,
};

export const word: CharacterClass = {
type: 'characterClass',
characters: ['\\w'],
inverted: false,
ranges: [],
isInverted: false,
};

export const whitespace: CharacterClass = {
type: 'characterClass',
characters: ['\\s'],
inverted: false,
ranges: [],
isInverted: false,
};

export function characterClass(...elements: CharacterClass[]): CharacterClass {
elements.forEach((element) => {
if (element.isInverted) {
throw new Error(
'`characterClass` should receive only non-inverted character classes'
);
}
});

return {
type: 'characterClass',
characters: elements.map((c) => c.characters).flat(),
ranges: elements.map((c) => c.ranges).flat(),
isInverted: false,
};
}

export function characterRange(start: string, end: string): CharacterClass {
if (start.length !== 1) {
throw new Error(
'`characterRange` should receive only single character `start` string'
);
}

if (end.length !== 1) {
throw new Error(
'`characterRange` should receive only single character `end` string'
);
}

if (start > end) {
throw new Error('`start` should be less or equal to `end`');
}

const range = {
start: escapeText(start),
end: escapeText(end),
};

return {
type: 'characterClass',
characters: [],
ranges: [range],
isInverted: false,
};
}

export function anyOf(characters: string): CharacterClass {
const charactersArray = characters.split('').map(escapeText);
if (charactersArray.length === 0) {
Expand All @@ -35,46 +86,55 @@ export function anyOf(characters: string): CharacterClass {
return {
type: 'characterClass',
characters: charactersArray,
inverted: false,
ranges: [],
isInverted: false,
};
}

export function inverted(characterClass: CharacterClass): CharacterClass {
export function inverted({
characters,
ranges,
isInverted,
}: CharacterClass): CharacterClass {
return {
type: 'characterClass',
characters: characterClass.characters,
inverted: !characterClass.inverted,
characters: characters,
ranges: ranges,
isInverted: !isInverted,
};
}

export function encodeCharacterClass(
characterClass: CharacterClass
): EncoderNode {
if (characterClass.characters.length === 0) {
throw new Error('Character class should contain at least one character');
export function encodeCharacterClass({
characters,
ranges,
isInverted,
}: CharacterClass): EncoderNode {
if (characters.length === 0 && ranges.length === 0) {
throw new Error(
'Character class should contain at least one character or character range'
);
}

if (characterClass.characters.length === 1 && !characterClass.inverted) {
// Direct rendering for single-character class
if (characters.length === 1 && ranges?.length === 0 && !isInverted) {
return {
precedence: EncoderPrecedence.Atom,
pattern: characterClass.characters[0]!,
pattern: characters[0]!,
};
}

const characterString = reorderHyphen(characterClass.characters).join('');
// If passed characters includes hyphen (`-`) it need to be moved to
// first (or last) place in order to treat it as hyphen character and not a range.
// See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes#types
const hypenString = characters.includes('-') ? '-' : '';
const charactersString = characters.filter((c) => c !== '-').join('');
const rangesString = ranges
.map(({ start, end }) => `${start}-${end}`)
.join('');
const invertedString = isInverted ? '^' : '';

return {
precedence: EncoderPrecedence.Atom,
pattern: `[${characterClass.inverted ? '^' : ''}${characterString}]`,
pattern: `[${invertedString}${hypenString}${rangesString}${charactersString}]`,
};
}

// If passed characters includes hyphen (`-`) it need to be moved to
// first (or last) place in order to treat it as hyphen character and not a range.
// See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes#types
function reorderHyphen(characters: string[]) {
if (characters.includes('-')) {
return ['-', ...characters.filter((c) => c !== '-')];
}

return characters;
}
11 changes: 10 additions & 1 deletion src/components/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,16 @@ export type Quantifier = One | OneOrMore | Optionally | ZeroOrMore | Repeat;
export type CharacterClass = {
type: 'characterClass';
characters: string[];
inverted: boolean;
ranges: CharacterRange[];
isInverted: boolean;
};

/**
* Character range from start to end (inclusive).
*/
export type CharacterRange = {
start: string;
end: string;
};

// Components
Expand Down