-
Notifications
You must be signed in to change notification settings - Fork 277
Sharing map variable-height trees #4641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharing map variable-height trees #4641
Conversation
On a set of methods from Apache Tika, the sum of the runtimes of the sharing map methods (timing just those) improved by 22%. Overall symex was 4% faster. |
9c5d14a
to
0ff6cab
Compare
@@ -779,6 +791,73 @@ ::gather_all(const innert &n, delta_viewt &delta_view) const | |||
iterate(n, f); | |||
} | |||
|
|||
SHARING_MAPT(void)::add_item_if_not_shared( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this method
src/util/sharing_map.h
Outdated
if(ip->shares_with(container)) | ||
return; | ||
|
||
const leaft &l1 = container.get_container().front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making l
, ll
, l1
more descriptive, they're pretty easy to muddle as it stands
src/util/sharing_map.h
Outdated
const leaf_listt &ll = ip->get_container(); | ||
SM_ASSERT(!ll.empty()); | ||
|
||
for(const auto &l : ll) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment explaining this list? It looks like you're sure container
doesn't have multiple entries but ip->get_container()
might do, and if it does and an entry is shared that means all further entries are too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, add_item_if_not_shared()
is called by get_delta_view()
only for containers which only have a single element. If we know that the leaf in the first container has the same key as a leaf in the second container (either because they share or the keys compare equal), then since the maps cannot contain duplicate keys, we can ignore all further keys in the second container since they will be different from the key of the leaf in the first container. We can ignore the further leafs in the second container since map1.get_delta_view(map2, ...)
does not include items that are only in map2
.
src/util/sharing_map.h
Outdated
if(l1.shares_with(l)) | ||
return; | ||
|
||
if(l1.get_key() == l.get_key()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we do anything when differing keys occur in container
and ip->get_container()
? Is that impossible for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes there's a case missing for when we don't find a matching key. The bug would only appear when there are hash collisions. I'll fix it and add a test.
|
||
return ip; | ||
UNREACHABLE; | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some compilers will complain about this unreachable statement
@@ -1001,13 +1098,138 @@ SHARING_MAPT(void)::erase(const key_type &k) | |||
num--; | |||
} | |||
|
|||
SHARING_MAPT2(, innert *)::migrate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this methid
@@ -126,6 +131,163 @@ void sharing_map_internals_test() | |||
sm.count_unmarked_nodes(false, marked, true); | |||
REQUIRE(marked.size() >= 2); | |||
} | |||
|
|||
SECTION("single map shape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe this in some more detail, what is shape?
unit/util/sharing_map.cpp
Outdated
@@ -214,15 +214,14 @@ void sharing_map_internals_test() | |||
|
|||
REQUIRE(delta_view.size() == 2); | |||
|
|||
REQUIRE(!delta_view[0].in_both); | |||
REQUIRE(!delta_view[0].is_in_both_maps()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash this commit
@@ -1283,7 +1288,7 @@ SHARING_MAPT2(optionalt<std::reference_wrapper<const, mapped_type>>)::find( | |||
|
|||
SHARING_MAPT(const std::string)::not_found_msg="key not found"; | |||
|
|||
SHARING_MAPT(const std::size_t)::bits = 18; | |||
SHARING_MAPT(const std::size_t)::bits = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely belongs in its own commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
d5e1c2f
to
8de261e
Compare
src/util/sharing_map.h
Outdated
for(const auto &item : m) | ||
{ | ||
const innert *i = &item.second; | ||
stack.push(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think i
adds any value here, I'd just do stack.push(&item.second);
for(const auto &l : ll) | ||
{ | ||
if(l1.shares_with(l)) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected continue
- why is it ok to return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because if l1
shares with l
, then since the maps don't contain duplicate keys, all further keys in the container must be different from l1.get_key()
. Since map1.get_delta_view(map2, ...)
does not include items that are only in map2
, we can return immediately.
// element is in both maps and not shared | ||
delta_view.push_back( | ||
{true, l1.get_key(), l1.get_value(), l.get_value()}); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I would have expected continue
instead of returning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, since we have found a matching key, all other keys will be different and don't have to be included in the delta_view
.
src/util/sharing_map.h
Outdated
innert *ip) | ||
{ | ||
SM_ASSERT(step < steps - 1); | ||
SM_ASSERT(ip != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it must not be a null pointer, why not pass in a reference?
|
||
// Add internal node | ||
ip = ip->add_child(bit_last); | ||
SM_ASSERT(ip->empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just declare a local pointer variable instead of re-using the ip
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
|
||
SM_ASSERT(i < steps); | ||
|
||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a for
loop instead: for(std::size_t i = step + 1; i < steps; ++i)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first iteration we don't need to check the condition so a do-while is probably fine.
src/util/sharing_map.h
Outdated
|
||
leaft leaf_copy(as_const(&container_copy)->get_container().front()); | ||
leaf_listt &c = ip->get_container(); | ||
c.push_front(leaf_copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c
doesn't seem to add much value: ip->get_container().push_front(leaf_copy);
src/util/sharing_map.h
Outdated
@@ -922,6 +923,7 @@ ::get_delta_view( | |||
if(!child.shares_with(*ip2)) | |||
{ | |||
stack.push(stack_itemt(&child, ip2)); | |||
level_stack.push(0xff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use optionalt
, or else name that constant.
Previously the sharing map was a fixed-height tree (with height 6 by default). This changes the sharing map to a variable-height tree.
When the sharing map was a fixed-height tree, the type of a node was determined by its depth in the tree. Therefore, the node depth was tracked when traversing the tree. Now that the sharing map is a variable-height tree, the methods is_internal() and is_container() are used to determine the type of a node.
Now that the sharing map tree grows dynamically as elements are inserted, a higher value for the maximum tree height can be used. The maximum tree height is determined by bits / chunk, with bits being the number of bits of the hash code used, and chunk being the number of bits used to determine the child node of an inner node. For example, if chunk = 3, then a node can have at most 8 children. Then if bits = 30, bits / chunk = 10 and the maximum tree height is 10. The constant bits should be divisible by chunk.
e6b5749
to
427a9fb
Compare
When the method get_delta_view() encounters a leaf node in the first map, it searches for a leaf with the same key in the corresponding subtree of the second map. Previously all the nodes in the subtree were visited. This commit changes the traversal such that the hash code of a key is used to determine the position of the leaf in the second map.
427a9fb
to
38c8388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 38c8388).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111936752
This changes the sharing map such that not all leafs are stored at the same depth (which was depth 6 previously), but that the tree grows dynamically as elements are inserted.