-
Notifications
You must be signed in to change notification settings - Fork 715
Feat/marf compression #6654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feat/marf compression #6654
Conversation
…ue if it's 0, store a bitmap and non-empty trie ptrs if the list is sparse, and store node patches atop full nodes (and read them back)
…then record the original node from which it was copied so that a TrieNodePatch can be calculated and stored instead
…oid repeating nodes across tries
… each node and see if we can instead patch an existing node instead of storing a (mostly-unchanged) copy
| .with_compression(true), | ||
| MARFOpenOpts::new(TrieHashCalculationMode::Immediate, "noop", true) | ||
| .with_compression(true), | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: revert this comment block
| + rusqlite::types::ToSql | ||
| + rusqlite::types::FromSql | ||
| + stacks_common::codec::StacksMessageCodec | ||
| + crate::codec::StacksMessageCodec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use stacks_common not crate
| #[cfg(test)] | ||
| use stacks_common::types::chainstate::BlockHeaderHash; | ||
| use stacks_common::types::chainstate::{ | ||
| use crate::types::chainstate::BlockHeaderHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use stacks_common not crate
federico-stacks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve provided a number of inline remarks, mostly nits, suggestions, and minor improvements.
Here are a few more in-depth points:
-
Bitmap flag(aka 0xff): Could we use a bit of the Node ID to indicate whether a bitmap is present? This could save one byte per node and rely on the node format described by the Node ID itself. -
Compression at target: The current code correctly handles both compressed and uncompressed MARF scenarios. Once compression becomes the standard, could we plan on simplify the code to follow a single “linear” path, keeping only the compression-related logic? -
Node ID as a type: Currently,TriePtr idis represented asu8(not introduced by this PR). Since we frequently interact with it, checking flags and node types, it might be useful to introduce a dedicated type. This could encapsulate convenient behaviors and avoid repeated conversions and manual flag interpretation. (In that case, this change could be addressed in a separate PR)
| /// This is up to 32 bytes, and must be prefixed by a 1-byte length. | ||
| /// | ||
| /// Returns Ok(path-bytes) on success | ||
| /// Returns Err(CorruptionError) if the path doesn't decode, or if the length prefix is invali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| /// Returns Err(CorruptionError) if the path doesn't decode, or if the length prefix is invali | |
| /// Returns Err(CorruptionError) if the path doesn't decode, or if the length prefix is invalid |
| } else { | ||
| trace!("Node {} has dense compressed ptrs", clear_ctrl_bits(*nid)); | ||
| // this is a nearly-full ptrs list | ||
| // ptrs list is compresesd, meaning each ptr might be a different size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| // ptrs list is compresesd, meaning each ptr might be a different size | |
| // ptrs list is compressed, meaning each ptr might be a different size |
| use crate::chainstate::stacks::index::storage::TrieStorageConnection; | ||
| use crate::chainstate::stacks::index::{trie_sql, Error, MarfTrieId}; | ||
| use crate::types::chainstate::TrieHash; | ||
| use crate::types::sqlite::NO_PARAMS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In stackslib/src/chainstate/stacks/index/file.rs has been changed using use rusqlite::params;
Possibly it is better to use same import for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would potentially touch a lot of lines. Let's safe refactoring until the end, once there's sufficient test coverage.
| /// Inner method for reading a node, and optionally its hash as well. | ||
| /// Uses either the DB or the .blobs file, depending on which is configured. | ||
| /// If `read_hash` is `false`, then the returned hash is just the empty hash of all 0's. | ||
| /// Inner loop of inner_read_patched_persisted_nodetype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| /// Inner loop of inner_read_patched_persisted_nodetype | |
| /// Inner loop of [`TrieStorageConnection::inner_read_patched_persisted_nodetype`] |
| /// | ||
| /// 0 N | ||
| /// |-------------------------------------| | ||
| /// list of compresed `TriePtr`s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| /// list of compresed `TriePtr`s | |
| /// list of compressed `TriePtr`s |
| let ptrs_start_disk_ptr = r | ||
| .seek(SeekFrom::Current(0)) | ||
| .inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the more idiomatic stream_position() here?
| let ptrs_start_disk_ptr = r | |
| .seek(SeekFrom::Current(0)) | |
| .inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?; | |
| let ptrs_start_disk_ptr = r | |
| .stream_position() | |
| .inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| assert_eq!(node_data_order.len(), offsets.len()); | ||
|
|
||
| // write parent block ptr | ||
| f.seek(SeekFrom::Start(0))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the idiomatic rewind here?
| f.seek(SeekFrom::Start(0))?; | |
| f.rewind()?; |
I also noticed two other occurrences of this pattern in storage.rs that aren’t touched by this PR. We might want to update those as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| let buffer = if self.compress { | ||
| let mut compressed_buffer = Cursor::new(Vec::new()); | ||
| trie_ram.dump_compressed(self, &mut compressed_buffer, &bhh)?; | ||
| compressed_buffer.into_inner() | ||
| } else { | ||
| let mut buffer = Cursor::new(Vec::new()); | ||
| trie_ram.dump(self, &mut buffer, &bhh)?; | ||
| buffer.into_inner() | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s some minor duplication here. Maybe we could let the conditional select the appropriate method (dump_compressed vs. dump) and keep the rest of the logic shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
|
||
| for (ix, indirect) in node_data_order.iter().enumerate() { | ||
| if let Some((hash_bytes, patch)) = indirect.hash_and_patch() { | ||
| let f_pos_before = f.seek(SeekFrom::Current(0))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the more idiomatic stream_position() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| Error::CorruptionError(format!("Failed to serialize patch: {e:?}")) | ||
| })?; | ||
|
|
||
| let f_pos_after = f.seek(SeekFrom::Current(0))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the more idiomatic stream_position() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
The need for the sparse
I don't think so. We have no way to compel existing node operators to re-compress their chainstate, and they could continue to use uncompressed chainstate even across hard forks.
That's fine in principle, but that kind of refactoring would change potentially many lines. Let's first make sure the functional test coverage is sufficiently adequate that we feel comfortable shipping this, and then we can worry about the refactoring in a follow-up PR. |
This PR implements work discovered in #6593. Specifically, it makes the following changes to the on-disk representation of the MARF:
If a
TriePtris not a back-block pointer, then theback_blockfield is not stored since it's always 0.If a node's children
TriePtrlist is sparse, then instead of storingTriePtr::default()for empty pointers, the system now stores a bitmap of which pointers are non-empty, and then only stores the non-emptyTriePtrs. It only does this if this actually saves space over just storing the entire list (storing the entire list is cheaper if the list is nearly full).Instead of copying a trie node from one trie to the next as part of a copy-on-write, the system will only store a patch from the old node to the new node. It will create a list of up to 16 patches across 16 tries before storing a full copy of the node.
In my (very small scale) benchmarks, this saves over 50% of space.
This PR is still a draft and will likely remain so for some time. It needs a lot more unit tests, and it would benefit significantly from fuzz and property testing against the current implementation. In addition, it will need a lot of performance tuning, since the act of reading a list of patch nodes will slow down reads.