Skip to content

Add public key deduplication migration #7738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: feature/account-public-key-deduplication
Choose a base branch
from

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Aug 15, 2025

Updates #7573

This PR adds public key deduplication by migration (to run during spork).

The migration will deduplicate public keys to reduce the number of registers (aka payloads).

Reducing the number of payloads (and mtrie nodes that pointed to them) has benefits beyond EN memory reduction. For example, other machines and components that handle payloads can startup faster, use less memory, etc. such as databases, caches, indexes, etc.

Results using a July 2025 mainnet snapshot show:

  • number of registers (aka payloads) was reduced by over 86 million registers
  • EN memory reduction is TBD (this can sometimes be ~3x state size reduction)
  • state size was reduced by ~21 GB
  • duration of public key deduplication step is only 4.5 minutes (using m1 vm)
  • max RSS of util program running migration is 890 GB RAM

Also, roughly 2 nodes from mtrie that are 96 bytes each are removed for each register removed.

Details

Account public key deduplication migration deduplicates account public keys and related data:

  • Add optional public key metadata to account status register
  • Store unique public keys in batch
  • Store sequence number separately from public key
  • etc.

This migration and these changes will reduce number of payloads from mtrie without preventing concurrent execution that uses multiple proposal keys from the same account.

Runtime Deduplication As Keys are Added (Post-Spork)

A separate PR will provide post-spork runtime deduplication (automatic deduplication as public keys are added) and support of account status format v4.

Tradeoffs

The deduplication data format balances speed, storage used, number of registers (aka payloads), etc.

As an analogy, 3rd-party programs like gzip, zstd, etc. don't use their max compression ratio by default because the tradeoff in speed won't be acceptable for most users.

NOTE: deduplication doesn't use gzip or zstd (they are good examples of tradeoffs).

Notes about running migration

When using nworkers=64, the util program running the migration had max RSS of 890 GB RAM (using a July 2025 mainnet snapshot) and the deduplication migration step took ~4.5 mins. To avoid OOM crashes and slowdowns, the OS should have extra RAM available for caching large files and running other processes, etc.

This commit adds optional fields to AccountStatus to make it
compatible with account public key deduplication migration program.

More changes will be added to retrieve and modify optional fields
in AccountStatusV4.
Account public key deduplication migration component migrates and
deduplicates account public keys and related data:
- Add optional public key metadata to account status register
- Store unique public keys in batch
- Store sequence number separately from public key
- etc.

These changes reduce number of registers (aka payloads), and
will not prevent concurrent execution that uses multiple
proposal keys from same account.

NOTE: concurrent execution is unrelated to this migration.
Account public key deduplication stores sequence number separately
from public key.  Also, sequence number register is created only
when account key is used as proposal key to reduce register count.

To support concurrent execution, storage used (stored in account
status register) can't change when sequence number register is
created or updated.  To resolve this, sequence number storage size
(fix-size) should be included when account public key is added
(not when sequence number register is created).

This commit makes the following changes to storage used
computation:
- Exclude sequence number registers in storage used computation
- Add fix-sized sequence number storage size for all account
  public keys at key index >= 1

NOTE: account public key 0 is stored with sequence number, so
we only include sequence number storage size for subsequent
account public keys.
This commit adds public key deduplication and storage used
migrations to util program's execution-state-extract subcommand.

NOTE: When using nworkers=64, the util program running the
migration had max RSS of 890 GB RAM (using a July 2025
mainnet snapshot) and the deduplication migration step took ~4.5 mins.
To avoid OOM crashes and slowdowns, the OS should have extra RAM
available for caching large files and running other processes, etc.
@fxamacker fxamacker requested review from janezpodhostnik and a team August 15, 2025 16:13
@fxamacker fxamacker self-assigned this Aug 15, 2025
@fxamacker fxamacker requested a review from a team as a code owner August 15, 2025 16:13
@fxamacker fxamacker added enhancement New feature or request Performance labels Aug 15, 2025
This commit updates expected state commitments in tests caused by
AccountStatus format change.
This updates comments and some function names for clarification
because legacy code sometimes refers to "account public key"
as just "account key", etc.
@bluesign
Copy link
Contributor

bluesign commented Aug 17, 2025

quick question, does this deduplicate by account ? or is it global ? ( just curious about the technique )

@fxamacker
Copy link
Member Author

fxamacker commented Aug 17, 2025

quick question, does this deduplicate by account ? or is it global ? ( just curious about the technique )

Hi @bluesign, the deduplication is by account.

EDIT: I added more code comments about deduplication.

@fxamacker fxamacker force-pushed the fxamacker/account-public-key-deduplication-migration branch from 883ed12 to 8a0b469 Compare August 18, 2025 19:09
@fxamacker fxamacker force-pushed the fxamacker/account-public-key-deduplication-migration branch from 8a0b469 to 7b8afa5 Compare August 18, 2025 19:11
This commit updates the storage used computation
related to sequence number registers.

NOTE: Sequence number registers (and the way we update
storage used) is needed to avoid blocking concurrent
execution.
@fxamacker
Copy link
Member Author

@janezpodhostnik thanks for meeting on short notice and sharing your thoughts about storage used computation (e.g., related to sequence numbers and individual sequence number registers needed to avoid blocking concurrent execution).

Commit df0775a incorporates some ideas from our meeting into the storage used computation. PTAL 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants