Skip to content

Conversation

@vchuravy
Copy link
Member

@vchuravy vchuravy commented Sep 2, 2023

Split out from #51066 for independent review and merge.

Prototyped in https://github.com/vchuravy/HashArrayMappedTries.jl for #50958.

The implementation is based on a Hash Array Mapped Trie (HAMT)
following Bagwell (2000).

A HAMT uses a fixed branching factor (commonly 32) together with each node being sparse.
In order to search for an entry we take the hash of the key and chunk it up into blocks,
with a branching factor of 32 each block is 5 bits. We use those 5 bits to calculate the
index inside the node and use a bitmap within the node to keep track if an element is
already set. This makes search a log(32, n) operation.

Persistency is implemented by path-copying. When we insert/delete a value into the HAMT
we copy each node along the path into a new HAMT, all other nodes are shared with
the previous HAMT.

A noteable implementation choice is that I didn't add a (resizeable) root table.
Normally this root table is dense and uses the first t bits to calculate an index
within. This makes large HAMT a bit cheaper since the root-table effectivly folds
multiple lookup steps into one. It does hurt persistent use-cases since path-copying
means that we also copy the root node/table.

Importantly the HAMT itself is not immutable/persistent, the use of it as part of the
PersistentDict is. Direct mutation of the underlying data breaks the persistentcy
invariants. One could use the HAMT to implement a non-persistent dictionary (or
other datastructures).

As an interesting side-note we could use a related data-structure Ctrie
to implement a concurrent lock-free dictionary. Ctrie also support O(1) snapshotting
so we could replace the HAMT used here with a Ctrie.

@vchuravy vchuravy added the collections Data structures holding multiple items, e.g. sets label Sep 2, 2023
@gbaraldi
Copy link
Member

gbaraldi commented Sep 4, 2023

The implementation looks good to me!

@mbauman mbauman removed their request for review September 5, 2023 19:07
@vchuravy vchuravy merged commit 8599e2f into master Sep 7, 2023
@vchuravy vchuravy deleted the vc/persistent_dict branch September 7, 2023 17:55
function get(default::Callable, dict::PersistentDict{K,V}, key::K) where {K,V}
trie = dict.trie
if HAMT.islevel_empty(trie)
return default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be default() and also needs a test that covers this path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

collections Data structures holding multiple items, e.g. sets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants