A robust and modular Regular Expression Parser implemented from the ground up in C. This engine supports a comprehensive suite of meta-characters, including complex quantifiers, character classes, and alternation patterns, enabling high-performance string matching without external libraries.
The parser is built using a modular, function-driven architecture that processes regex patterns through specialized control modules. Unlike heavy library-based solutions, this engine focuses on precision and low-level memory efficiency:
- Pattern Decomposition: The system breaks down complex regex patterns into atomic units (literals, meta-characters, or groups).
- Specialized Control Modules: Implements dedicated matching logic for different regex families:
squareBracketsControl: Handles character sets and ranges (a-z,0-9).starControl&plusControl: Manages greedy quantification through iterative expansion.dividerControl: executes alternation logic (|) for branch-based matching.
- Recursive Matching Engine: Orchestrates the results of individual modules to validate whether a candidate string belongs to the language defined by the regex.
- Zero-Dependency Design: Written purely in standard C, ensuring maximum portability across diverse operating systems and embedded environments.
RegexParser/
├── LICENSE # MIT License
├── README.md # Project documentation
├── .vscode/ # Development environment config
│
├── Implementation/
│ ├── RegEx_Parser.c # Core engine and control modules
│ └── test.c # Comprehensive test suite
│
└── Documentation/
├── ProjectDocument.pdf # In-depth technical methodology
└── Regex-Template.pdf # Syntax reference and use cases
The parser provides a powerful syntax for sophisticated pattern matching across various categories:
()Parentheses: Encapsulates sub-expressions for priority or grouping.|Vertical Bar: Enables logical "OR" operations (e.g.,gray|grey).
?Question Mark: Optional character matching (0 or 1).*Asterisk: Zero or more occurrences.+Plus Sign: One or more occurrences.{n,m}Curly Braces: Precise control over repetition ranges.
[]Brackets: Matches any character within the set (supports ranges like[a-z]).^Caret /$Dollar: Anchors the match to the start or end of the string..Dot: Matches any character except newlines.\Backslash: Escapes meta-characters for literal matching.
| Component | Specification |
|---|---|
| Programming Language | C (Standard C99/C11) |
| Memory Handling | Stateless Buffer-based Processing |
| Supported Meta-chars | ( ) [ ] { } ^ $ . \ | ? * + |
| Character Ranges | Alphanumeric, HEX, Special Chars |
| Max Group Complexity | Nested Parentheses Support |
Clone the repository and enter the directory:
git clone https://github.com/Zer0-Bug/RegexParser.gitcd RegexParserCompile the source code using any standard C compiler (GCC recommended):
# Compiling the core engine with the test suite
gcc RegEx_Parser.c test.c -o regex_parserLaunch the compiled executable to verify the parser against predefined patterns:
./regex_parserContributions are always appreciated. Open-source projects grow through collaboration, and any improvement—whether a bug fix, new feature, documentation update, or suggestion—is valuable.
To contribute, please follow the steps below:
- Fork the repository.
- Create a new branch for your change:
git checkout -b feature/your-feature-name - Commit your changes with a clear and descriptive message:
git commit -m "Add: brief description of the change" - Push your branch to your fork:
git push origin feature/your-feature-name - Open a Pull Request describing the changes made.
All contributions are reviewed before being merged. Please ensure that your changes follow the existing code style and include relevant documentation or tests where applicable.
- Ken Thompson (1968) - Regular Expression Search Algorithm. Communications of the ACM.
- Jeffrey Friedl (2006) - Mastering Regular Expressions. O'Reilly Media.
∞