Compiler for translating Regular Expressions (REs) into a domain specific ISA for Cicero.
The simplest way to build the compiler is by using Docker. Docker/Dockerfile is provided to build the image:
# Build docker image which contains build dependencies
docker build -t cicero_build_environment:latest Docker
# Run build and test within docker image
docker run -v $PWD:/app cicero_build_environment:latest /bin/bash /app/Docker/build_and_test.shInstall dependencies first:
# Ubuntu Linux
# Add LLVM apt repository, follow instruction on https://apt.llvm.org/
apt install libmlir-16-dev mlir-16-tools llvm-16-dev antlr4 libantlr4-runtime-dev cmake
# Fedora Linux
dnf install cmake antlr4 antlr4-cpp-runtime-devel mlir-devel llvm-develcmake: cross platform build file generatorantlr4: tool for building parser/lexer from declarative grammar/tokensantlr4-cpp-runtime-devel: C++ runtime for antlr4mlir-devel: intermediate representation libraryllvm-devel: compiler infrastructure library
Once dependencies are installed, clone this repo and cd into it:
mkdir build
cd build
# Optional, only if you want to build tests
git submodule update --init --recursive
# If you don't want to build tests, add `-DBUILD_TESTING=OFF` to the next command
cmake ..
cmake --build .Once built, the compiler executable can be found in ./build/ciceroc.
To compiler an example ab|cd RE into out.cicero, enabling all optimizations, you can run:
./build/ciceroc --regex="ab|cd" --emit=compiled -o out.cicero -Oall
Different output targets can be achieved by specifying one of the available options: --emit=regexmlir|ciceromlir|ciceromlir.dot|compiled.
Optimizations can be enabled all together (-Oall), or one by one: -Oregex, -Oregexboundary, -Ojump.
Output binary can be inspected using ./build/objdump binary.cicero