This is a compiler that can:
- compile regular expressions into fast, streaming parsers (see [http://www.diku.dk/kmc/documents/GHR14-0-paper.pdf]), and
- compile programs written in the regular expression-based language Kleenex into fast, streaming string transformations ([http://www.diku.dk/kmc/documents/ghrst2016-0-paper.pdf]).
If you want to quickly get a sense of what Kleenex is, you can download a VirtualBox image that is ready to play around with from http://kleenexlang.org.
To clone, run git clone --recursive https://github.com/diku-kmc/kleenexlang.git
.
Due to dependencies not on Hackage, it is easiest to build in a sandbox. After cloning, cd into project directory and run cabal sandbox init && cabal sandbox add-source regexps-syntax
. Then pull in dependencies by cabal install --dependencies-only
.
To build, run cabal configure && cabal build
. This will place a binary in dist/build/kexc/kexc
.
First write a Kleenex program:
> cat add-commas.kex
main := (num /[^0-9]/ | other)*
num := digit{1,3} ("," digit{3})*
digit := /[0-9]/
other := /./
Next compile a transducer using the kexc
executable:
> kexc compile add-commas.kex --out add-commas
Finally, pipe input to the transducer:
> echo "2016" | ./add-commas
2,016
A number of test suites are included.
- To run the unit tests:
cabal test
. - To test the C runtime:
cd crt_test && make
. Note that this uses the Valgrind tool. - To run the end-to-end blackbox tests:
cd test/test_compiled && make
.
The repository includes a benchmark suite that compares the performance of string transformation programs written in Kleenex with equivalent programs written using other regular expression-based libraries and tools.
To run the benchmarks and generate the plots, first cd bench
and then:
- generate the test data:
make generate-test-data
- install the external benchmark dependencies (libraries, etc.):
make install-benchmark-dependencies
- build the benchmark programs (not Kleenex programs):
make build-benchmark-programs
- (optional) check that the benchmark programs are equivalent:
make -k equality-check
- build the Kleenex programs:
./compiletime.sh -f
- run /all/ the benchmark programs N times with M warm-up rounds:
./runningtime.sh -r <N> -w <M> -f
- generate the plots:
./mkplots.py
- the plots are placed in
bench/plots
(unless otherwise specified tomkplots.py
)