Description
[ Prompted by https://github.com//pull/4449 ]
The SMT-LIB standard ( http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf Page 57 ) describes a reproducible resource limit option for solvers :
reproducible-resource-limit
default: 0
support: optional
If the solver supports this option, setting it to 0 disables it. Setting it a non-zero numeral n
will cause each subsequent check command to terminate within a bounded amount
of time dependent on n. The internal implementation of this option and its relation
to run time or other concrete resources can be solver-specific. However, it is required
that the invocation of a check command return unknown whenever the solver is unable
to determine the satisfiability of the formulas in the current context within the current
resource limit. Setting a higher value of n should allow more resources to be used,
which may cause the command to return sat or unsat instead of unknown. Furthermore,
the returned result should depend deterministically on n; specifically, it should be the
same every time the solver is run with the same sequence of previous commands on the
same machine (and with an arbitrarily long external time out). If the solver makes use
of randomization, it may require the :random-seed option to be set to a value other than
0 before :reproducible-resource-limit can be set to a positive value.
Note that this option is not intended to be used for comparisons between different solvers since they can implement it differently. Its purpose is simply to guarantee the reproducibility of an individual solver’s results under the same external conditions.
This can be implemented as a simple count on learnt clauses, clause database cleans, etc. The relationship between a number and an actual run time will depend on the input, solver, hardware, etc. but should always be the same (up to the limits of hardware determinism).
I think we should use this for external solvers and implement something similar for the internal one because:
-
It is deterministic. For many applications having analysis tools return different results for the same problem is unacceptable. It means we can reliably test time-out behaviour as well.
-
If implemented as a number of counters / counts at multiple stages during the process, we should get finer-grained performance information and be able to more reliably track changes in performance without the normal noise.