This is the static analyzer project of Antoine Groudiev and Matthieu Boyer, for the class Semantics and application to verfication. It implements the base domains, that is the constant, sign, interval, congruence domains, the reduced product between the interval and the congruence domains; furthermore, support for the Karr’s linear equality analysis and Backward analysis have been added as extensions.
The provided analyzer should pass every single test included with the project, except for Karr's domain. Specialized tests for each domain and feature have been added.
As recommended, the iterator (in the file iterator.ml
) implements a classic worklist algorithm. The Hashtbl environment
maintains a map from nodes to abstract values. The main logic is contained inside iterate_function
, which handles the worklist, and iter_arc
, which applies the appropriate rule of the domain to the visited arc.
Widening is handled by applying initially a DFS to the CFG and marking nodes which might need widening. When this node is encoutered during the main worklist procedure, widening is applied if necessary. Note that the iterator never calls the narrow
function.
A VALUE_DOMAIN
is transformed into a DOMAIN
using the Domain
functor in domain.ml
. It abstracts a set of mappings from variables to integers, abstracting sets of integers with the given VALUE_DOMAIN
.
The constant domain abstracts a set of integers as:
type t = | Top | Bot | Int of Z.t
In most cases, operations such as widen
, meet
and join
do not convey much information. Similarly, any non-trivial rand
is abstracted as Top
.
The functions binary
and bwd_binary
are slightly more complicated. For binary
, multiplication with Int Z.zero
is guaranteed to result in Int Z.zero
. Values constant/constant_div_zero.c
. For bwd_binary
, the cases of
We chose the following representation for the sign domain elements:
type t = | Bot | Top | Pos | Neg | Null
A more complicated sign latice could have been chosen, separating for instance positive values into stricly positive, and positive or null.
Pretty much every function is straighforward. Division can zero can still be detected and abstracted as
To represent an interval, a bound element has been introduced:
type bound = | MinusInf | Finite of Z.t | PlusInf
which can later be used in the definition of an abstract type:
type t = | Bot | Top | Interval of bound * bound
Firstly, I later realized that I could have abstracted Top
as `Interval (MinusInf, PlusInf), which would have reduced the length of the code. Nevertheless,
Operations on bound
have been introduced, to handle infinite cases. Note that in practice, for an analyzer, cases such as bottomize_if_necessary
is used to transform Interval
into Bot
when the bounds are inverted.
For binary
, only multiplication and division are tricky to handle. The main idea is to enumerate all possible cases and to take the most general one. Similarly, I found bwd_binary
extremly difficult to handle in the case of multiplication. Cases where the interval contains zero need to be taken care of, we have to be careful to round intervals - otherwise multiple tests do not pass\dots It was not fun to code, and even less to debug.
For this domain, a set is either of the form
type t = | Bot | Modulo of Z.t * Z.t
Note that
Most operations are easy to handle since little information can be deduced. This can be seen, for example, in compare
, which results most of the time in meet
and join
is interesting since one can sometimes use rafinments such as GCD to give a precise abstraction.
The reduced product domain maintains in parallel two abstract representations for an element:
type t = IntervalDomain.t * CongruencesDomain.t
Most operations simply apply the respective functions of both domains to each component of the pair. The only interesting cases are the ones of widen
and narrow
, where a reduction
function use information from both domains to obtain a precise abstraction. This is done by implementing in IntervalDomain
functions to convert intervals into congruences, and congruences into intervals.
Note that for some unknown reason, it appears that the reduced product domain fails to correctly handle assertions in series: some tests for the congruence domain containing multiple assertions do not pass since the second assertion is not correctly handled. The problem do not occur for individual assertions, that is when re-writing tests using only one assertion at a time. This behavior is difficult to debug, despite numerous hours spent on it, since it does not happen for other domains, but seems to be an iterator problem rather than a domain problem.
In this domain, we only check affine assignations or equalities, else we resort to non-deterministic assignment. This domain is not based on a value domain, since we only consider relational constraints between multiple variables (multiple can be only one, this might be seen as fixing the value of a variable). We represent the domain using the following type:
type t = | Bot | Constraints of (Q.t * Q.t array) array
where both arrays should be of length Constraints d
can be seen as a matrix of the coefficients in each affine constraint and a vector of each constant in the affine constraint. For each row
where Top
.\
To actually compute operations, we use the Row-Echelon Normal Form of the matrix.
We can compute this form for any matrix of any size, thus, intersection can be done by considering the concatenation of the two arrays, simplifying by gaussian elimination, and taking all the non-empty rows, or returning Bot
.
This however, poses problems in the implementation as, for still unknown reasons, this function is called multiple times on different inputs, and thus returns Bot
or Top
depending on the situation.
We implement equality by checking coefficients in the row-echelon normal form, and then we can compute the inclusion by considering that
For the assignation of variables, we simplify the right side of the assignement to
where
The implementation proposed here is not working, for reasons that are behind my comprehension. I have spent hours coding and debugging this, and I still have no clue about what goes wrong...
When the backward analysis mode is enabled, the analyzer will process failing assertions in more detail. When it encounters a failing assertion, it will try to go back to find values for the variables which will trigger the assertion. If no environment do trigger the assertion, we can conclude that the assertion always passes, and continue the analysis. The idea is to apply the exact same iteration procedure but in reverse (in practice, changing for example node_in
in node_out
) starting from the assertion node using the environment which triggers the false assertion.
A bwd_assign
function is implemented for every value-domain based domain, that is every domain except for Karr's one. It does exactly what is hinted by the subject, and its implementation is quite short and uses the Domain
functor to write the code once for all value domains.
We use the file backward_abs.c
as an example:
void main(){
int x = rand(-10, 10);
int z = 0;
int y = 0;
if (x==0) {
z = 0;
} else {
y = x;
if (y < 0) { y = -y; }
z = x / y;
}
assert(x > 0); //@OK
}
The forward analysis fails since in the else
branch, x
takes values into
The backward analysis will go up, starting from the assertion, and will identify that the assertions fails when