Code regarding Parravicini et al. 2021 paper can be found here.
Cicero is a domain specific architecture that can be employed to perform exact regular expression (RE) matching using FPGAs. The cool fact about Cicero is that - as other software libraries one among the other RE2 - does not suffer from backtracking problem. This means that when it elaborate a REs that carry some kind of non-determinsm (e.,g. a?a ) it does not take a guess and then backtrack but can explore all the different options in a single pass of the input string.
If you are interested in the topic take a look at Russ Cox article.
From a system perspective, Cicero features two components:
- A compiler: which compiles REs into a domain specific ISA binary
- An architecture on FPGA: which receives a compiled RE and an input string, and output wheter the input is matched by the RE or not.
The compiler's code can be found here. The compiler is implemented using MLIR and ANTLR4. The compilation pipeline can be described as follows:
- Parse textual RE into ANTLR4 AST
- Generate representation of Regex using the proposed
regex
MLIR dialect - (optional) Optimization pass on
regex
dialect - Lowering conversion of
regex
dialect to proposedcicero
dialect - (optional) Optimization pass on
cicero
dialect - Generate Cicero ISA binary code
The Cicero architecture features a sliding window of input characters. Each character in the window is addressed by a CC_ID_BITS
-bits wide pointer, as such the window contains 2^CC_ID_BITS
characters.
The Cicero architecture is composed of multiple engines, which can be combined together in ring or torus topologies. Execution threads are distributed among engines by a load balancing infrastructure. However, during our studies we found out that an architecture configuration with a single engine is more efficient.
Each engine packs as many FIFOs and CICERO-cores as number of characters in the input window.
bitstream
: pre-compiled bitstreams for Ultra96 v2 board, and their static metrics (board usage percentages and total on-chip power)cicero_compiler
: older compiler implementationcicero_compiler_cpp
: new compiler implementation, using MLIRhdl_src
: System Verilog implementation of the architectureproj
: Vivado project files for the architecture developmentscripts
: Various helper scripts for development, verification and benchmarking
See development.md
For latest Artifact Evaluation instuctions, see AE.md
This work has financial support from ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU. The authors are grateful to the CGO 2025's anonymous reviewer feedback, the AMD University Program support, and Valentina Sona for working on the original Cicero architecture simulator.
If you find this repository useful, please use the following citations:
@inproceedings{somaini2025cicero,
title = {Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression Matching},
author = {Andrea Somaini and Filippo Carloni and Giovanni Agosta and Marco D. Santambrogio and Davide Conficconi},
year = 2025,
month = {mar},
booktitle={2025 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)}
}
@article{parravicini2021cicero,
title = {{CICERO}: A Domain-Specific Architecture for Efficient Regular Expression Matching},
author = {Daniele Parravicini and Davide Conficconi and Emanuele Del Sozzo and Christian Pilato and Marco D. Santambrogio},
journal = {{ACM} Transactions on Embedded Computing Systems},
year = 2021,
month = {oct},
publisher = {Association for Computing Machinery ({ACM})},
volume = {20},
number = {5s},
pages = {1--24},
doi = {10.1145/3476982},
url = {https://doi.org/10.1145%2F3476982},
}