-
Notifications
You must be signed in to change notification settings - Fork 195
Add public key deduplication migration #7738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/account-public-key-deduplication
Are you sure you want to change the base?
Add public key deduplication migration #7738
Conversation
This commit adds optional fields to AccountStatus to make it compatible with account public key deduplication migration program. More changes will be added to retrieve and modify optional fields in AccountStatusV4.
Account public key deduplication migration component migrates and deduplicates account public keys and related data: - Add optional public key metadata to account status register - Store unique public keys in batch - Store sequence number separately from public key - etc. These changes reduce number of registers (aka payloads), and will not prevent concurrent execution that uses multiple proposal keys from same account. NOTE: concurrent execution is unrelated to this migration.
Account public key deduplication stores sequence number separately from public key. Also, sequence number register is created only when account key is used as proposal key to reduce register count. To support concurrent execution, storage used (stored in account status register) can't change when sequence number register is created or updated. To resolve this, sequence number storage size (fix-size) should be included when account public key is added (not when sequence number register is created). This commit makes the following changes to storage used computation: - Exclude sequence number registers in storage used computation - Add fix-sized sequence number storage size for all account public keys at key index >= 1 NOTE: account public key 0 is stored with sequence number, so we only include sequence number storage size for subsequent account public keys.
This commit adds public key deduplication and storage used migrations to util program's execution-state-extract subcommand. NOTE: When using nworkers=64, the util program running the migration had max RSS of 890 GB RAM (using a July 2025 mainnet snapshot) and the deduplication migration step took ~4.5 mins. To avoid OOM crashes and slowdowns, the OS should have extra RAM available for caching large files and running other processes, etc.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This commit updates expected state commitments in tests caused by AccountStatus format change.
This updates comments and some function names for clarification because legacy code sometimes refers to "account public key" as just "account key", etc.
quick question, does this deduplicate by account ? or is it global ? ( just curious about the technique ) |
Hi @bluesign, the deduplication is by account. EDIT: I added more code comments about deduplication. |
883ed12
to
8a0b469
Compare
8a0b469
to
7b8afa5
Compare
This commit updates the storage used computation related to sequence number registers. NOTE: Sequence number registers (and the way we update storage used) is needed to avoid blocking concurrent execution.
@janezpodhostnik thanks for meeting on short notice and sharing your thoughts about storage used computation (e.g., related to sequence numbers and individual sequence number registers needed to avoid blocking concurrent execution). Commit df0775a incorporates some ideas from our meeting into the storage used computation. PTAL 🙏 |
Updates #7573
This PR adds public key deduplication by migration (to run during spork).
The migration will deduplicate public keys to reduce the number of registers (aka payloads).
Reducing the number of payloads (and mtrie nodes that pointed to them) has benefits beyond EN memory reduction. For example, other machines and components that handle payloads can startup faster, use less memory, etc. such as databases, caches, indexes, etc.
Results using a July 2025 mainnet snapshot show:
util
program running migration is 890 GB RAMAlso, roughly 2 nodes from mtrie that are 96 bytes each are removed for each register removed.
Details
Account public key deduplication migration deduplicates account public keys and related data:
This migration and these changes will reduce number of payloads from mtrie without preventing concurrent execution that uses multiple proposal keys from the same account.
Runtime Deduplication As Keys are Added (Post-Spork)
A separate PR will provide post-spork runtime deduplication (automatic deduplication as public keys are added) and support of account status format v4.
Tradeoffs
The deduplication data format balances speed, storage used, number of registers (aka payloads), etc.
As an analogy, 3rd-party programs like gzip, zstd, etc. don't use their max compression ratio by default because the tradeoff in speed won't be acceptable for most users.
NOTE: deduplication doesn't use gzip or zstd (they are good examples of tradeoffs).
Notes about running migration
When using nworkers=64, the util program running the migration had max RSS of 890 GB RAM (using a July 2025 mainnet snapshot) and the deduplication migration step took ~4.5 mins. To avoid OOM crashes and slowdowns, the OS should have extra RAM available for caching large files and running other processes, etc.