add PersistentDict based on a HAMT #51164
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Split out from #51066 for independent review and merge.
Prototyped in https://github.com/vchuravy/HashArrayMappedTries.jl for #50958.
The implementation is based on a Hash Array Mapped Trie (HAMT)
following Bagwell (2000).
A HAMT uses a fixed branching factor (commonly 32) together with each node being sparse.
In order to search for an entry we take the hash of the key and chunk it up into blocks,
with a branching factor of 32 each block is 5 bits. We use those 5 bits to calculate the
index inside the node and use a bitmap within the node to keep track if an element is
already set. This makes search a
log(32, n)operation.Persistency is implemented by path-copying. When we insert/delete a value into the HAMT
we copy each node along the path into a new HAMT, all other nodes are shared with
the previous HAMT.
A noteable implementation choice is that I didn't add a (resizeable) root table.
Normally this root table is dense and uses the first
tbits to calculate an indexwithin. This makes large HAMT a bit cheaper since the root-table effectivly folds
multiple lookup steps into one. It does hurt persistent use-cases since path-copying
means that we also copy the root node/table.
Importantly the HAMT itself is not immutable/persistent, the use of it as part of the
PersistentDictis. Direct mutation of the underlying data breaks the persistentcyinvariants. One could use the HAMT to implement a non-persistent dictionary (or
other datastructures).
As an interesting side-note we could use a related data-structure Ctrie
to implement a concurrent lock-free dictionary. Ctrie also support
O(1)snapshottingso we could replace the HAMT used here with a Ctrie.