Skip to content

Commit

Permalink
feat: changed all regex function to proc-macros
Browse files Browse the repository at this point in the history
  • Loading branch information
Saphereye committed Jul 4, 2024
1 parent 77dcd95 commit 516253f
Show file tree
Hide file tree
Showing 18 changed files with 319 additions and 140 deletions.
55 changes: 54 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 9 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "gregex"
version = "0.6.0"
version = "0.7.0"
edition = "2021"
authors = ["Saphereye <adarshdas950@gmail.com>"]
license = "MIT"
Expand All @@ -19,4 +19,12 @@ repository = "https://github.com/Saphereye/gregex"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[workspace]
members = [
"gregex-macros",
"gregex-logic",
]

[dependencies]
gregex-macros = { path = "gregex-macros" }
gregex-logic = { path = "gregex-logic" }
22 changes: 2 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,5 @@
# Gregex ![crates.io](https://img.shields.io/crates/v/gregex.svg) ![Build Passing](https://github.com/Saphereye/gregex/actions/workflows/ci.yml/badge.svg)

Gregex is a regular expression solver which utilizes Non-deterministic Finite Automata (NFA) to simulate the input strings.
![](https://github.com/Saphereye/gregex/raw/master/assets/gregex_workflow.excalidraw.svg)

## Usage

```rust
extern crate gregex;
use gregex::*;
fn main() {
let tree = dot!(star!('a'), 'b', 'c');
let regex = regex(&tree);
assert!(regex.run("abc"));
assert!(!regex.run("a"));
assert!(regex.run("aaabc"));
}
```

## Theory
The project uses [Glushkov's construction algorithm](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm) for creating the NFA.

The pipeline can be summarised as below
![](https://github.com/Saphereye/gregex/blob/master/assets/gregex_workflow.excalidraw.svg)
Gregex is a regular expression solver which utilizes Non-deterministic Finite Automata (NFA) to simulate the input strings.
9 changes: 9 additions & 0 deletions examples/dot.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
extern crate gregex;
use gregex::*;

fn main() {
let runner = regex!(dot!('a', 'b', 'c'));
assert_eq!(runner.run("abc"), true);
assert_eq!(runner.run("ab"), false);
assert_eq!(runner.run("abcd"), false);
}
9 changes: 9 additions & 0 deletions examples/or.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
extern crate gregex;
use gregex::*;

fn main() {
let runner = regex!(or!('a', 'b', 'c'));
assert_eq!(runner.run("a"), true);
assert_eq!(runner.run("b"), true);
assert_eq!(runner.run("c"), true);
}
9 changes: 9 additions & 0 deletions examples/star.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
extern crate gregex;
use gregex::*;

fn main() {
let runner = regex!(star!('a'));
assert_eq!(runner.run("a"), true);
assert_eq!(runner.run("aa"), true);
assert_eq!(runner.run(""), true);
}
6 changes: 6 additions & 0 deletions gregex-logic/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[package]
name = "gregex-logic"
version = "0.1.0"
edition = "2021"

[dependencies]
6 changes: 6 additions & 0 deletions gregex-logic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Gregex Logic
Contains the underlying logic of the Gregex crate. This crate is responsible for converting the Node tree to the NFA. The NFA is then used to match the input string.

The crate uses the [Glushkov's Construction Algorithm](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm) to convert the Node tree to the NFA. The advantage over the Thompson's Construction Algorithm is that the NFA generated has states equal to number of terminals + 1. Although, the NFA generated by Thumpson's can be converted to the Glushkov's form, by removing the epsilon transitions.

The `translation` module contains the code to convert the Node tree to the NFA. The `nfa` module contains the code to match the input string with the NFA.
7 changes: 7 additions & 0 deletions gregex-logic/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#[doc = include_str!("../README.md")]
#[cfg(not(doctest))]
pub mod nfa;
pub mod translation;

use std::sync::atomic::AtomicU32;
pub static TERMINAL_COUNT: AtomicU32 = AtomicU32::new(0);
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
13 changes: 13 additions & 0 deletions gregex-macros/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[package]
name = "gregex-macros"
version = "0.1.0"
edition = "2021"

[dependencies]
gregex-logic = { path = "../gregex-logic" }
syn = { version = "1.0", features = ["full"] }
quote = "1.0"
proc-macro2 = "1.0"

[lib]
proc-macro = true
19 changes: 19 additions & 0 deletions gregex-macros/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Gregex Macros
Contains the macro interface for all the gregex function.

Without these, users would have to rely on function that generate the Node tree. To explain this we can first look at an example.

Let's take the regex `a*`.

The Node tree in our case would be,
```rust
Node::Operation(
Operator::Production,
Box::new(Node::Terminal('a', 0u32)),
None,
)
```

Although we can wrap this in a function or a `macro_rules!` macro, the generated code is quite bloated. We can do the hard work during compilation, i.e. converting our regex to the end NFA.

Currently converting to NFA is not possible, but this crate can convert it to the interstitial form of the Node Tree.
173 changes: 173 additions & 0 deletions gregex-macros/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
#[doc = include_str!("../README.md")]
#[cfg(not(doctest))]
extern crate proc_macro;

use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, Expr, ExprLit, ExprMacro, Lit};

#[proc_macro]
pub fn dot(input: TokenStream) -> TokenStream {
let inputs = parse_macro_input!(input with syn::punctuated::Punctuated::<Expr, syn::Token![,]>::parse_terminated);

let nodes = inputs.iter().map(|expr| {
match expr {
Expr::Macro(ExprMacro { mac, .. }) => {
// Handle procedural macro
quote! { #mac }
}
Expr::Lit(ExprLit { lit, .. }) => match lit {
Lit::Char(c) => {
let count = gregex_logic::TERMINAL_COUNT
.fetch_add(1, core::sync::atomic::Ordering::SeqCst);
quote! {
gregex_logic::translation::node::Node::Terminal(#c, #count)
}
}
_ => panic!("Unsupported literal type"),
},
_ => panic!("Unsupported input type"),
}
});

// Generate the code for concatenating nodes
let mut iter = nodes.into_iter();
let first = iter.next().expect("The input is empty");
let operations = iter.fold(first, |left, right| {
quote! {
gregex_logic::translation::node::Node::Operation(
gregex_logic::translation::operator::Operator::Concat,
Box::new(#left),
Some(Box::new(#right))
)
}
});

// Generate the final token stream
let gen = quote! {
#operations
};

gen.into()
}

#[proc_macro]
pub fn or(input: TokenStream) -> TokenStream {
let inputs = parse_macro_input!(input with syn::punctuated::Punctuated::<Expr, syn::Token![,]>::parse_terminated);

let nodes = inputs.iter().map(|expr| {
match expr {
Expr::Macro(ExprMacro { mac, .. }) => {
// Handle procedural macro
quote! { #mac }
}
Expr::Lit(ExprLit { lit, .. }) => match lit {
Lit::Char(c) => {
let count = gregex_logic::TERMINAL_COUNT
.fetch_add(1, core::sync::atomic::Ordering::SeqCst);
quote! {
gregex_logic::translation::node::Node::Terminal(#c, #count)
}
}
_ => panic!("Unsupported literal type"),
},
_ => panic!("Unsupported input type"),
}
});

// Generate the code for concatenating nodes
let mut iter = nodes.into_iter();
let first = iter.next().expect("The input is empty");
let operations = iter.fold(first, |left, right| {
quote! {
gregex_logic::translation::node::Node::Operation(
gregex_logic::translation::operator::Operator::Or,
Box::new(#left),
Some(Box::new(#right))
)
}
});

// Generate the final token stream
let gen = quote! {
#operations
};

gen.into()
}

#[proc_macro]
pub fn star(input: TokenStream) -> TokenStream {
let expr = parse_macro_input!(input as Expr);

let node = match expr {
Expr::Macro(ExprMacro { mac, .. }) => {
// Handle procedural macro
quote! { #mac }
}
Expr::Lit(ExprLit { lit, .. }) => match lit {
Lit::Char(c) => {
let count =
gregex_logic::TERMINAL_COUNT.fetch_add(1, core::sync::atomic::Ordering::SeqCst);
quote! {
gregex_logic::translation::node::Node::Terminal(#c, #count)
}
}
_ => panic!("Unsupported literal type"),
},
_ => panic!("Unsupported input type"),
};

// Generate the code for the star operation
let operation = quote! {
gregex_logic::translation::node::Node::Operation(
gregex_logic::translation::operator::Operator::Production,
Box::new(#node),
None
)
};

// Generate the final token stream
let gen = quote! {
#operation
};

gen.into()
}

#[proc_macro]
pub fn regex(input: TokenStream) -> TokenStream {
let expr = parse_macro_input!(input as Expr);

// Convert the input expression into a Node structure
let node = match expr {
Expr::Macro(ExprMacro { mac, .. }) => {
// Handle procedural macro
quote! { #mac }
}
Expr::Lit(ExprLit { lit, .. }) => match lit {
Lit::Char(c) => {
let count =
gregex_logic::TERMINAL_COUNT.fetch_add(1, core::sync::atomic::Ordering::SeqCst);
quote! {
gregex_logic::translation::node::Node::Terminal(#c, #count)
}
}
_ => panic!("Unsupported literal type"),
},
_ => panic!("Unsupported input type"),
};

// Generate the code to convert the Node into a Regex
let gen = quote! {
{
let regex_tree = #node;
let prefix_set = gregex_logic::translation::node::prefix_set(&regex_tree);
let suffix_set = gregex_logic::translation::node::suffix_set(&regex_tree);
let factors_set = gregex_logic::translation::node::factors_set(&regex_tree);
gregex_logic::nfa::NFA::set_to_nfa(&prefix_set, &suffix_set, &factors_set)
}
};

gen.into()
}
Loading

0 comments on commit 516253f

Please sign in to comment.