Grime is a two-dimensional pattern matching language based on Boolean grammars. The basic idea is to construct rectangular patterns from smaller components, and check whether they are found in the input matrix.
You can try Grime online, courtesy of Dennis from PPCG. Be sure to check the tutorial too.
A Grime program is called a grammar, and the correct file extension for a grammar is .gr
, although this is not enforced.
The program is called like
runhaskell grime.hs [options] grammarfile matrixfile
where matrixfile
is a file containing the character matrix.
For example, the digits program would be called like
runhaskell grime.hs digits.gr digit-matrix
If you want a speed boost, you should compile Grime with the command
ghc -O2 grime.hs
and then run the resulting executable:
grime [options] grammarfile matrixfile
There's also a bash script rungrime
that compiles Grime if needed and then calls the executable with the provided arguments.
By default, the interpreter prints the first match it finds, but this can be controlled using the option flags:
-e
: match only the entire matrix, print1
for match and0
for no match.-n
: print the number of matches, or the entire matrix if-e
is also given.-a
: print all matches.-p
: print also the positions of the matches, in the format(x,y,w,h)
.-s
: do not print the matches themselves.-b
: include borders of width 1.-d
or-d0
: print debug information.-d1
: print logs from the matcher.
These can be inserted from the command line, or in the beginning of any line of a grammar file, separated from the rest of the line by a backtick `
.
A Grime grammar consists of one or more definitions, each on a separate line.
Each of them defines the value of a nonterminal, and one of them must define the anonymous toplevel nonterminal.
The syntax of a definition is either N=E
or E
, where N
is an uppercase letter and E
is an expression.
Expressions are constructed as follows.
- Any character escaped with
\
matches any1x1
rectangle containing that character. .
matches any single character.b
matches a1x1
rectangle outside the character matrix. It's only relevant if the-b
option is used, or if the input is not rectangular.e
matches a 0-height or 0-width segment of the input's edge.$
always matches.f
matches any 0-height (flat) rectangle, andt
matches any 0-width (thin) rectangle.- The pre-defined character groups are
d
igit,u
ppercase,l
owercase,a
lphabetic, alphan
umeric, ands
ymbol. - New character classes can be defined by the syntax
[a-prt-w,d-gu]
. The letters on the left are included, and those on the right are excluded, so this matches exactly the lettersabchijklmnoprtvw
. If the left side is empty, it is taken to contain all characters and the border. The comma can be omitted if the right side is empty. The characters[],-\
must be escaped with\
.\b
matches a border. - An unescaped uppercase letter is a nonterminal, and matches the expression it is assigned.
_
is the anonymous toplevel nonterminal.- If
P
andQ
are expressions, thenPQ
is just their horizontal concatenation, andP/Q
is their vertical concatenation, withP
on top. P Q
is likePQ
, but with lower precedence than/
. This sometimes saves parentheses.P+
is one or moreP
s aligned horizontally, andP/+
is the same aligned vertically.- The Boolean operations are denoted
P|Q
,P&Q
andP!
. Exclusive OR isP~Q
, and difference isP-Q
. P?
is shorthand fort|P
,P/?
forf|P
,P*
fort|P+
, andP/*
forf|P/+
.P{a-b,c-d}
, whereabcd
are nonnegative integers, is a size constraint onP
. IfP
is a character class, then the expression matches anymxn
rectangle containing only those characters, provided thatm
is betweena
andb
inclusive, andn
is betweenc
andd
inclusive. In other cases, the expression matches any rectangle that has the correct size and thatP
also matches. Ifa
orc
are omitted, they are taken to be0
, and ifb
ord
are omitted, they are infinite. If the hyphen betweena
andb
is omitted, then we use the same number for both ends of the interval. If the entirec-d
part is omitted, both axes are constrained. To clarify,{-b}
is equivalent to{0-b,0-b}
, and{a-,c}
is equivalent to{a-infinity,c-c}
. The}
can be omitted.P:a-b,c-d}
is a grid specifier. It matches a rectangle that can be divided into anmxn
grid of matches ofP
, wherem
is betweena
andb
inclusive, andn
is betweenc
andd
inclusive. The ranges and}
can be omitted, with the same defaulting behavior as in the size constraint, except that the beginning of a range defaults to1
instead of0
.P#a-b}
is a counting specifier. It matches a rectangle that contains betweena
andb
possibly overlapping matches ofP
, inclusive. The}
is optional,a
defaults to1
andb
to infinity.PoS
, whereS
is a (greedily parsed) string of the characters01234567
, optionally terminated with}
, is an orientation modifier. Each of the characters stands for a rotation (0123
) or reflection (4567
), which is applied to the expressionP
. The transformed expressions are combined with Boolean disjunctions. The charactersOXNTKHADC
encode convenient classes of orientations. The characterF
fixes an expression, so that its orientation cannot be modified.<P>
is a context bracket. It matches any rectangle r that's contained in a larger rectangle that matchesP
. The rectangle r can be matched by a digit inP
; this is0
by default, and increases with the nesting depth (0
matches the anchor of the innermost brackets,1
the next one etc).
You can add the prefix ^
to any infix operator or chain of postfix operators to raise its precedence higher than all other operators, or v
to lower it; for example, \a\b^|\cv+
is parsed as (\a(\b|\c))+
.
If i
is any infix operator apart from concatenation or /
, and p
a chain of postfix operators, then PipQ
is a shorthand for (PiQ)p
; for example, \a|+\b
is equivalent to (\a|\b)+
.
Double quotes ""
are used for swapping the \
-escaped status of all characters inside them, except for the characters "\/
.
They allow text grids like "this/is a/grid"
.
Non-literal syntax elements, like nonterminals, operations and parentheses, can be used in quoted expressions by escaping them.
Note that quotes define syntax elements, so "ab/cd"+/E
will be parsed as (\a\b/\c\d)+/E
.
All open parentheses, brackets and quotes are implicitly closed at the end of a line.
Lines beginning with |
are comments, and the parser ignores them.
Grime does allow paradoxical definitions of the form A=A!
; these will result in no match.
In general, the matching engine can resolve any grammar where no match of a single nonterminal on a single rectangle depends on itself (although for complex grammars, this can take a while).
A nontrivial example of a resolvable grammar is
A=E?\a
B=[ab]+&A!
E=A|B
a`A
Given a one-dimensional string of a
s and b
s, this grammar will match every substring that ends in a
.
Note how the nonterminal A
is self-referential in more than one way (via E
directly, and via E
through B
), but since the E
in A
must be a proper substring, the system can always be resolved.
Grime supports non-rectangular inputs; the rows are simply aligned to the left, and the gaps can be matched using b
.