Skip to content

Introduce range class and use it to optimize symex #3454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

romainbrenguier
Copy link
Contributor

@romainbrenguier romainbrenguier commented Nov 22, 2018

The goal of this PR is to make symex faster by avoiding a set copy and merging, to do this without making the code too complicated, the PR introduce the following.
It adds map, filter and concat iterators which can be used to iterate over transformed containers without having to make copy of them, and it adds a ranget class which make it easy to use these new iterators.

This part of the code is executed in all our tests so the fact that all existing tests pass should show these work correctly.

The performance difference for this change were evaluate on 143 methods
from tika, which were chosen because analysis time was taking between 1
second and 1 minute for them.
The time of symex step was evaluated alone (options --show-vcc --verbosity 0).
The comparison of the time results is here:
perf_range_optimization

The total execution time was reduced by 21% and the average speed up is 14% (the difference between the two means the optimization seems to have less influence on simpler benchmarks).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • [na] The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • [na] White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@smowton
Copy link
Contributor

smowton commented Nov 22, 2018

Looks cool! One thing to consider as a global design issue (@peterschrammel @kroening @tautschnig): if we're going down the road of introducing ranges and operations thereon, the other choice is to use Boost, which already provides a very high quality library for this purpose.

The usual pitfalls with Boost are (a) craploads of spurious warnings and (b) instability, making it hard to guarantee the build is actually the same across dozens of available versions of the library. In the security product our solution has been to nominate a particular version and download and build it as part of the build script, similar to our approach to Minisat, which takes around 5 seconds to complete under most circumstances. We had to trim the boost tarball to achieve that, but that was a one-time operation and has worked well for us so far.

@LAJW
Copy link
Contributor

LAJW commented Nov 22, 2018

@smowton https://github.com/ericniebler/range-v3 was the thing I suggested, but @romainbrenguier was afraid of people refusing to add yet another dependency, so he wrote his own (which looks great BTW).

Ranges' interface is a bit easier to use than Boost::iterators (filter/transform iterator type definitions are particularly atrocious), it can take a whole container at once and is easily composable (which can be seen in this PR). I think it works with rvalues too, which makes it even faster.

(Offtop) I don't think Boost is an essential dependency anymore. I find there's very little left in it that's essential (except the networking libraries, and filesystem, but for the latter I'd prefer to just upgrade to C++17). The only things I'm really missing are std::variant + std::visit (which are a bit awkward before 17), std::any and boost::expected. But there are small dedicated libraries for these, so we can pull them as we go.

@romainbrenguier romainbrenguier force-pushed the optimize-symex-using-range branch from 4173064 to b6af140 Compare November 22, 2018 12:09
Copy link
Contributor

@LAJW LAJW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Replacing with ranges-v3 later on shouldn't be a problem (the interface is very similar).

it++)
for(auto it = all_current_names_range.begin();
it != all_current_names_range.end();
++it)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

auto dest_state_names_range =
make_range(dest_state.level2.current_names)
.filter(
[&](const std::pair<irep_idt, std::pair<ssa_exprt, unsigned>> &pair) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏ I'd consider const statet::value_type & here. 🌊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually statet::renaming_levelt::current_namest::value_type so not really better I think

// go over all variables to see what changed
std::unordered_set<ssa_exprt, irep_hash> variables;
auto ssa_of_current_name =
[&](const std::pair<irep_idt, std::pair<ssa_exprt, unsigned>> &pair) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌊

for(auto it = all_current_names_range.begin();
it != all_current_names_range.end();
++it)
for(const auto ssa : all_current_names_range)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto &

src/util/range.h Outdated
concat_begin, concat_end);
}

bool is_empty() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
++underlying;
if(underlying != underlying_end)
current = util_make_unique<outputt>((*f)(*underlying));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏ 🤔 I'm not sure this should happen on iteration (maybe on dereference would be better). That could also mean you wouldn't havet to store the result of current in the iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dereference operator has to be const and return a const ref, so the object needs to exist before dereference is called, if I'm not mistaken. So I don't see how to do that not on iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dereference operator doesn't have to return const reference, it can also return a value (it's perfectly legal). So in this case, just return f(underlying).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about operator->? It has to return a pointer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you just want to use mutable? It seems the documentation doesn't coincide with what is going on here either.

template <
typename containert,
typename iteratort = typename containert::const_iterator>
ranget<iteratort> make_range(const containert &container)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: b6af140).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92284727

@romainbrenguier romainbrenguier force-pushed the optimize-symex-using-range branch from b6af140 to bc95e96 Compare November 22, 2018 13:32
@kroening
Copy link
Member

BTW, I agree on that offtopic-assessment.

@romainbrenguier romainbrenguier force-pushed the optimize-symex-using-range branch from bc95e96 to 6e685a5 Compare November 22, 2018 13:35
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫
This PR failed Diffblue compatibility checks (cbmc commit: bc95e96).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92295101
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 6e685a5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92296254

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great step forward, and also includes a very nice performance evaluation. My comments below are mostly about documentation. In addition to this I think a unit test would also be very nice to have as it would serve as how-to-use-this documentation.

src/util/range.h Outdated
public:
using value_typet = typename iteratort::value_type;

explicit ranget(iteratort begin, iteratort end)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit seems unnecessary here?

src/util/range.h Outdated
\*******************************************************************/

/// \file
/// Ranges: pair of begin and end iterators.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could do with a bit more text to explain why would want to use this - there is an elaborate and very useful explanation below, but this sentence by itself is not particularly helpful.

{
++underlying;
if(underlying != underlying_end)
current = util_make_unique<outputt>((*f)(*underlying));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you just want to use mutable? It seems the documentation doesn't coincide with what is going on here either.

};

/// Iterator which only stops at element for which some given function \c f is
/// true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if f never evaluates to true?

std::shared_ptr<std::function<bool(const value_type &)>> f;

/// Ensure that the underlying iterator is always positioned on an element
/// for which `f` is true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, also document the case where f never evaluates to true.

src/util/range.h Outdated
}
};

/// Iterate other a first iterator and when the end is reached, iterate other
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/other/over/ - and you don't iterate over an iterator. I think this needs to be reworded completely.

using value_type = outputt;
using pointer = const outputt *;
using reference = const outputt &;
using iterator_category = std::forward_iterator_tag;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are public members that would deserve some documentation (in particular the use of difference_type = void makes me wonder).

/// Preincrement operator
map_iteratort &operator++()
{
++underlying;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if underlying == underlying_end from the start? You do explicitly handle this below, but you should probably also place an invariant here.

@@ -19,6 +19,7 @@ Author: Daniel Kroening, kroening@kroening.com
#include <util/std_expr.h>

#include <analyses/dirty.h>
#include <util/range.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move that to the above collection of includes from util

@romainbrenguier
Copy link
Contributor Author

@tautschnig I improved the documentation a bit and added unit tests

@romainbrenguier romainbrenguier force-pushed the optimize-symex-using-range branch from 6d6e575 to 9b6dc0e Compare November 26, 2018 09:54
These are pairs of begin and end iterators for which we implement
filter, map and concat methods.
They only manipulate iterators so this avoids having to create
intermediate heavy data-structures
This part of the code is executed in all our tests so the fact that all
existing tests pass should show this change works correctly.

The performance difference for this change were evaluate on 143 methods
from tika, which were chosen because analysis time was taking between 1
second and 1 minute for them.
The time of symex step was evaluated alone (options --show-vcc --verbosity 0).

The execution time for the whole of the benchmarks was reduced by 21%
and the average speed up is 14% (the difference between the two means
the optimization seems to have less influence on simpler benchmarks).
This gives examples of how to use the class.
@romainbrenguier romainbrenguier force-pushed the optimize-symex-using-range branch from 9b6dc0e to cfb1203 Compare November 26, 2018 10:30
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: cfb1203).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92528292

@tautschnig tautschnig merged commit 961d974 into diffblue:develop Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants