-
Notifications
You must be signed in to change notification settings - Fork 277
Introduce range class and use it to optimize symex #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce range class and use it to optimize symex #3454
Conversation
47ac392
to
4173064
Compare
Looks cool! One thing to consider as a global design issue (@peterschrammel @kroening @tautschnig): if we're going down the road of introducing ranges and operations thereon, the other choice is to use Boost, which already provides a very high quality library for this purpose. The usual pitfalls with Boost are (a) craploads of spurious warnings and (b) instability, making it hard to guarantee the build is actually the same across dozens of available versions of the library. In the security product our solution has been to nominate a particular version and download and build it as part of the build script, similar to our approach to Minisat, which takes around 5 seconds to complete under most circumstances. We had to trim the boost tarball to achieve that, but that was a one-time operation and has worked well for us so far. |
@smowton https://github.com/ericniebler/range-v3 was the thing I suggested, but @romainbrenguier was afraid of people refusing to add yet another dependency, so he wrote his own (which looks great BTW). Ranges' interface is a bit easier to use than Boost::iterators (filter/transform iterator type definitions are particularly atrocious), it can take a whole container at once and is easily composable (which can be seen in this PR). I think it works with rvalues too, which makes it even faster. (Offtop) I don't think Boost is an essential dependency anymore. I find there's very little left in it that's essential (except the networking libraries, and filesystem, but for the latter I'd prefer to just upgrade to C++17). The only things I'm really missing are |
4173064
to
b6af140
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Replacing with ranges-v3 later on shouldn't be a problem (the interface is very similar).
src/goto-symex/symex_goto.cpp
Outdated
it++) | ||
for(auto it = all_current_names_range.begin(); | ||
it != all_current_names_range.end(); | ||
++it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
auto dest_state_names_range = | ||
make_range(dest_state.level2.current_names) | ||
.filter( | ||
[&](const std::pair<irep_idt, std::pair<ssa_exprt, unsigned>> &pair) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏ I'd consider const statet::value_type &
here. 🌊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually statet::renaming_levelt::current_namest::value_type
so not really better I think
// go over all variables to see what changed | ||
std::unordered_set<ssa_exprt, irep_hash> variables; | ||
auto ssa_of_current_name = | ||
[&](const std::pair<irep_idt, std::pair<ssa_exprt, unsigned>> &pair) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌊
src/goto-symex/symex_goto.cpp
Outdated
for(auto it = all_current_names_range.begin(); | ||
it != all_current_names_range.end(); | ||
++it) | ||
for(const auto ssa : all_current_names_range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏ const auto &
src/util/range.h
Outdated
concat_begin, concat_end); | ||
} | ||
|
||
bool is_empty() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
++underlying; | ||
if(underlying != underlying_end) | ||
current = util_make_unique<outputt>((*f)(*underlying)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏ 🤔 I'm not sure this should happen on iteration (maybe on dereference would be better). That could also mean you wouldn't havet to store the result of current
in the iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dereference operator has to be const and return a const ref, so the object needs to exist before dereference is called, if I'm not mistaken. So I don't see how to do that not on iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dereference operator doesn't have to return const reference, it can also return a value (it's perfectly legal). So in this case, just return f(underlying)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about operator->
? It has to return a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you just want to use mutable
? It seems the documentation doesn't coincide with what is going on here either.
template < | ||
typename containert, | ||
typename iteratort = typename containert::const_iterator> | ||
ranget<iteratort> make_range(const containert &container) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: b6af140).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92284727
b6af140
to
bc95e96
Compare
BTW, I agree on that offtopic-assessment. |
bc95e96
to
6e685a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫
This PR failed Diffblue compatibility checks (cbmc commit: bc95e96).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92295101
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.
Common spurious failures:
- the cbmc commit has disappeared in the mean time (e.g. in a force-push)
- the author is not in the list of contributors (e.g. first-time contributors).
The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 6e685a5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92296254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great step forward, and also includes a very nice performance evaluation. My comments below are mostly about documentation. In addition to this I think a unit test would also be very nice to have as it would serve as how-to-use-this documentation.
src/util/range.h
Outdated
public: | ||
using value_typet = typename iteratort::value_type; | ||
|
||
explicit ranget(iteratort begin, iteratort end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit
seems unnecessary here?
src/util/range.h
Outdated
\*******************************************************************/ | ||
|
||
/// \file | ||
/// Ranges: pair of begin and end iterators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could do with a bit more text to explain why would want to use this - there is an elaborate and very useful explanation below, but this sentence by itself is not particularly helpful.
{ | ||
++underlying; | ||
if(underlying != underlying_end) | ||
current = util_make_unique<outputt>((*f)(*underlying)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you just want to use mutable
? It seems the documentation doesn't coincide with what is going on here either.
}; | ||
|
||
/// Iterator which only stops at element for which some given function \c f is | ||
/// true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if f
never evaluates to true
?
std::shared_ptr<std::function<bool(const value_type &)>> f; | ||
|
||
/// Ensure that the underlying iterator is always positioned on an element | ||
/// for which `f` is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, also document the case where f
never evaluates to true
.
src/util/range.h
Outdated
} | ||
}; | ||
|
||
/// Iterate other a first iterator and when the end is reached, iterate other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/other/over/ - and you don't iterate over an iterator. I think this needs to be reworded completely.
using value_type = outputt; | ||
using pointer = const outputt *; | ||
using reference = const outputt &; | ||
using iterator_category = std::forward_iterator_tag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are public members that would deserve some documentation (in particular the use of difference_type = void
makes me wonder).
/// Preincrement operator | ||
map_iteratort &operator++() | ||
{ | ||
++underlying; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if underlying == underlying_end
from the start? You do explicitly handle this below, but you should probably also place an invariant here.
src/goto-symex/symex_goto.cpp
Outdated
@@ -19,6 +19,7 @@ Author: Daniel Kroening, kroening@kroening.com | |||
#include <util/std_expr.h> | |||
|
|||
#include <analyses/dirty.h> | |||
#include <util/range.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move that to the above collection of includes from util
6e685a5
to
6d6e575
Compare
@tautschnig I improved the documentation a bit and added unit tests |
6d6e575
to
9b6dc0e
Compare
These are pairs of begin and end iterators for which we implement filter, map and concat methods. They only manipulate iterators so this avoids having to create intermediate heavy data-structures
This part of the code is executed in all our tests so the fact that all existing tests pass should show this change works correctly. The performance difference for this change were evaluate on 143 methods from tika, which were chosen because analysis time was taking between 1 second and 1 minute for them. The time of symex step was evaluated alone (options --show-vcc --verbosity 0). The execution time for the whole of the benchmarks was reduced by 21% and the average speed up is 14% (the difference between the two means the optimization seems to have less influence on simpler benchmarks).
This gives examples of how to use the class.
9b6dc0e
to
cfb1203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: cfb1203).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92528292
The goal of this PR is to make symex faster by avoiding a set copy and merging, to do this without making the code too complicated, the PR introduce the following.
It adds map, filter and concat iterators which can be used to iterate over transformed containers without having to make copy of them, and it adds a
ranget
class which make it easy to use these new iterators.This part of the code is executed in all our tests so the fact that all existing tests pass should show these work correctly.
The performance difference for this change were evaluate on 143 methods

from tika, which were chosen because analysis time was taking between 1
second and 1 minute for them.
The time of symex step was evaluated alone (options --show-vcc --verbosity 0).
The comparison of the time results is here:
The total execution time was reduced by 21% and the average speed up is 14% (the difference between the two means the optimization seems to have less influence on simpler benchmarks).