Skip to content

Zer0-Bug/RegexParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regular Expression Parser: High-Performance C Engine

C Language GCC Compiler Regex Logic License

A robust and modular Regular Expression Parser implemented from the ground up in C. This engine supports a comprehensive suite of meta-characters, including complex quantifiers, character classes, and alternation patterns, enabling high-performance string matching without external libraries.

° ° ° °



Technical Architecture

The parser is built using a modular, function-driven architecture that processes regex patterns through specialized control modules. Unlike heavy library-based solutions, this engine focuses on precision and low-level memory efficiency:

  1. Pattern Decomposition: The system breaks down complex regex patterns into atomic units (literals, meta-characters, or groups).
  2. Specialized Control Modules: Implements dedicated matching logic for different regex families:
    • squareBracketsControl: Handles character sets and ranges (a-z, 0-9).
    • starControl & plusControl: Manages greedy quantification through iterative expansion.
    • dividerControl: executes alternation logic (|) for branch-based matching.
  3. Recursive Matching Engine: Orchestrates the results of individual modules to validate whether a candidate string belongs to the language defined by the regex.
  4. Zero-Dependency Design: Written purely in standard C, ensuring maximum portability across diverse operating systems and embedded environments.


Project Structure

RegexParser/
├── LICENSE                                   # MIT License
├── README.md                                 # Project documentation
├── .vscode/                                  # Development environment config
│
├── Implementation/
│   ├── RegEx_Parser.c                        # Core engine and control modules
│   └── test.c                                # Comprehensive test suite
│
└── Documentation/
    ├── ProjectDocument.pdf                   # In-depth technical methodology
    └── Regex-Template.pdf                    # Syntax reference and use cases


Regex Logic & Features

The parser provides a powerful syntax for sophisticated pattern matching across various categories:

1. Grouping & Alternation

  • () Parentheses: Encapsulates sub-expressions for priority or grouping.
  • | Vertical Bar: Enables logical "OR" operations (e.g., gray|grey).

2. Quantifiers

  • ? Question Mark: Optional character matching (0 or 1).
  • * Asterisk: Zero or more occurrences.
  • + Plus Sign: One or more occurrences.
  • {n,m} Curly Braces: Precise control over repetition ranges.

3. Character Classes & Anchors

  • [] Brackets: Matches any character within the set (supports ranges like [a-z]).
  • ^ Caret / $ Dollar: Anchors the match to the start or end of the string.
  • . Dot: Matches any character except newlines.
  • \ Backslash: Escapes meta-characters for literal matching.


Technical Specifications

Component Specification
Programming Language C (Standard C99/C11)
Memory Handling Stateless Buffer-based Processing
Supported Meta-chars ( ) [ ] { } ^ $ . \ | ? * +
Character Ranges Alphanumeric, HEX, Special Chars
Max Group Complexity Nested Parentheses Support


Deployment & Installation

1. Repository Acquisition

Clone the repository and enter the directory:

git clone https://github.com/Zer0-Bug/RegexParser.git
cd RegexParser

2. Compilation

Compile the source code using any standard C compiler (GCC recommended):

# Compiling the core engine with the test suite
gcc RegEx_Parser.c test.c -o regex_parser

3. Running the Tests

Launch the compiled executable to verify the parser against predefined patterns:

./regex_parser


Contribution

Contributions are always appreciated. Open-source projects grow through collaboration, and any improvement—whether a bug fix, new feature, documentation update, or suggestion—is valuable.

To contribute, please follow the steps below:

  1. Fork the repository.
  2. Create a new branch for your change:
    git checkout -b feature/your-feature-name
  3. Commit your changes with a clear and descriptive message:
    git commit -m "Add: brief description of the change"
  4. Push your branch to your fork:
    git push origin feature/your-feature-name
  5. Open a Pull Request describing the changes made.

All contributions are reviewed before being merged. Please ensure that your changes follow the existing code style and include relevant documentation or tests where applicable.

References

  1. Ken Thompson (1968) - Regular Expression Search Algorithm. Communications of the ACM.
  2. Jeffrey Friedl (2006) - Mastering Regular Expressions. O'Reilly Media.


Email × LinkedIn