Skip to content

Sharing map variable-height trees #4641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

danpoe
Copy link
Contributor

@danpoe danpoe commented May 9, 2019

This changes the sharing map such that not all leafs are stored at the same depth (which was depth 6 previously), but that the tree grows dynamically as elements are inserted.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@danpoe
Copy link
Contributor Author

danpoe commented May 9, 2019

On a set of methods from Apache Tika, the sum of the runtimes of the sharing map methods (timing just those) improved by 22%. Overall symex was 4% faster.

@danpoe danpoe force-pushed the feature/sharing-map-variable-height-tree branch from 9c5d14a to 0ff6cab Compare May 9, 2019 19:56
@@ -779,6 +791,73 @@ ::gather_all(const innert &n, delta_viewt &delta_view) const
iterate(n, f);
}

SHARING_MAPT(void)::add_item_if_not_shared(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this method

if(ip->shares_with(container))
return;

const leaft &l1 = container.get_container().front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making l, ll, l1 more descriptive, they're pretty easy to muddle as it stands

const leaf_listt &ll = ip->get_container();
SM_ASSERT(!ll.empty());

for(const auto &l : ll)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment explaining this list? It looks like you're sure container doesn't have multiple entries but ip->get_container() might do, and if it does and an entry is shared that means all further entries are too?

Copy link
Contributor Author

@danpoe danpoe May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, add_item_if_not_shared() is called by get_delta_view() only for containers which only have a single element. If we know that the leaf in the first container has the same key as a leaf in the second container (either because they share or the keys compare equal), then since the maps cannot contain duplicate keys, we can ignore all further keys in the second container since they will be different from the key of the leaf in the first container. We can ignore the further leafs in the second container since map1.get_delta_view(map2, ...) does not include items that are only in map2.

if(l1.shares_with(l))
return;

if(l1.get_key() == l.get_key())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we do anything when differing keys occur in container and ip->get_container()? Is that impossible for some reason?

Copy link
Contributor Author

@danpoe danpoe May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes there's a case missing for when we don't find a matching key. The bug would only appear when there are hash collisions. I'll fix it and add a test.


return ip;
UNREACHABLE;
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some compilers will complain about this unreachable statement

@@ -1001,13 +1098,138 @@ SHARING_MAPT(void)::erase(const key_type &k)
num--;
}

SHARING_MAPT2(, innert *)::migrate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this methid

@@ -126,6 +131,163 @@ void sharing_map_internals_test()
sm.count_unmarked_nodes(false, marked, true);
REQUIRE(marked.size() >= 2);
}

SECTION("single map shape")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe this in some more detail, what is shape?

@@ -214,15 +214,14 @@ void sharing_map_internals_test()

REQUIRE(delta_view.size() == 2);

REQUIRE(!delta_view[0].in_both);
REQUIRE(!delta_view[0].is_in_both_maps());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash this commit

@@ -1283,7 +1288,7 @@ SHARING_MAPT2(optionalt<std::reference_wrapper<const, mapped_type>>)::find(

SHARING_MAPT(const std::string)::not_found_msg="key not found";

SHARING_MAPT(const std::size_t)::bits = 18;
SHARING_MAPT(const std::size_t)::bits = 30;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely belongs in its own commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@danpoe danpoe force-pushed the feature/sharing-map-variable-height-tree branch 4 times, most recently from d5e1c2f to 8de261e Compare May 10, 2019 13:45
for(const auto &item : m)
{
const innert *i = &item.second;
stack.push(i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think i adds any value here, I'd just do stack.push(&item.second);

for(const auto &l : ll)
{
if(l1.shares_with(l))
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected continue - why is it ok to return?

Copy link
Contributor Author

@danpoe danpoe May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because if l1 shares with l, then since the maps don't contain duplicate keys, all further keys in the container must be different from l1.get_key(). Since map1.get_delta_view(map2, ...) does not include items that are only in map2, we can return immediately.

// element is in both maps and not shared
delta_view.push_back(
{true, l1.get_key(), l1.get_value(), l.get_value()});
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would have expected continue instead of returning.

Copy link
Contributor Author

@danpoe danpoe May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, since we have found a matching key, all other keys will be different and don't have to be included in the delta_view.

innert *ip)
{
SM_ASSERT(step < steps - 1);
SM_ASSERT(ip != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it must not be a null pointer, why not pass in a reference?


// Add internal node
ip = ip->add_child(bit_last);
SM_ASSERT(ip->empty());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just declare a local pointer variable instead of re-using the ip parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


SM_ASSERT(i < steps);

do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a for loop instead: for(std::size_t i = step + 1; i < steps; ++i)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first iteration we don't need to check the condition so a do-while is probably fine.


leaft leaf_copy(as_const(&container_copy)->get_container().front());
leaf_listt &c = ip->get_container();
c.push_front(leaf_copy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c doesn't seem to add much value: ip->get_container().push_front(leaf_copy);

@@ -922,6 +923,7 @@ ::get_delta_view(
if(!child.shares_with(*ip2))
{
stack.push(stack_itemt(&child, ip2));
level_stack.push(0xff);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use optionalt, or else name that constant.

@danpoe danpoe mentioned this pull request May 14, 2019
4 tasks
danpoe added 3 commits May 15, 2019 14:00
Previously the sharing map was a fixed-height tree (with height 6 by default).
This changes the sharing map to a variable-height tree.
When the sharing map was a fixed-height tree, the type of a node was determined
by its depth in the tree. Therefore, the node depth was tracked when traversing
the tree. Now that the sharing map is a variable-height tree, the methods
is_internal() and is_container() are used to determine the type of a node.
Now that the sharing map tree grows dynamically as elements are inserted, a
higher value for the maximum tree height can be used. The maximum tree height is
determined by bits / chunk, with bits being the number of bits of the hash code
used, and chunk being the number of bits used to determine the child node of an
inner node. For example, if chunk = 3, then a node can have at most 8 children.
Then if bits = 30, bits / chunk = 10 and the maximum tree height is 10. The
constant bits should be divisible by chunk.
@danpoe danpoe force-pushed the feature/sharing-map-variable-height-tree branch 2 times, most recently from e6b5749 to 427a9fb Compare May 15, 2019 13:58
danpoe added 2 commits May 15, 2019 15:00
When the method get_delta_view() encounters a leaf node in the first map, it
searches for a leaf with the same key in the corresponding subtree of the second
map. Previously all the nodes in the subtree were visited. This commit changes
the traversal such that the hash code of a key is used to determine the position
of the leaf in the second map.
@danpoe danpoe force-pushed the feature/sharing-map-variable-height-tree branch from 427a9fb to 38c8388 Compare May 15, 2019 14:14
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 38c8388).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111936752

@tautschnig tautschnig merged commit 1ca36e6 into diffblue:develop May 16, 2019
@danpoe danpoe deleted the feature/sharing-map-variable-height-tree branch June 2, 2020 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants