[TransactionManager] refactor and fix memory usage issues #10829

mwtian · 2023-04-12T19:51:26Z

Description

Fix a leak where lock_waiters map inserts an entry with empty LockQueue, that may not get removed.
Resize HashMaps in TM as load changes.
Only notify TM once after a transaction commits, with both tx digest and output keys.
Extract common logic to update lock queue and ready transactions after a transaction and its output objects commit.

Test Plan

unit tests

If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.

Type of Change (Check all that apply)

user-visible impact
breaking change for a client SDKs
breaking change for FNs (FN binary must upgrade)
breaking change for validators or node operators (must upgrade binaries)
breaking change for on-chain data layout
necessitate either a data wipe or data migration

Release notes

vercel · 2023-04-12T19:51:34Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

4 Ignored Deployments

Name	Status	Updated (UTC)
explorer	⬜️ Ignored (Inspect)	Apr 13, 2023 5:01pm
explorer-storybook	⬜️ Ignored (Inspect)	Apr 13, 2023 5:01pm
sui-wallet-kit	⬜️ Ignored (Inspect)	Apr 13, 2023 5:01pm
wallet-adapter	⬜️ Ignored (Inspect)	Apr 13, 2023 5:01pm

arun-koshy · 2023-04-13T02:54:43Z

crates/sui-core/src/transaction_manager.rs

+            return ready_certificates;
+        }
+
+        let input_count = self.input_objects.get_mut(&input_key.0).unwrap();


Just to confirm its safe to unwrap here because callers of the method must ensure input key is available in store and storage here refers to self.input_objects?

Yes, because we know there are non-zero # of transactions that are waiting on the input object. Adding a panic message.

arun-koshy · 2023-04-13T02:56:55Z

crates/sui-core/src/transaction_manager.rs

+            return ready_certificates;
+        };
+
+        // Waiters can acquire lock in eitehr readonly or default mode.


s/eitehr/either

arun-koshy · 2023-04-13T03:04:11Z

crates/sui-core/src/transaction_manager.rs

+    }
+
+    /// After reaching 3/4 load in hashmaps, increase capacity to decrease load to 1/2.
+    fn maybe_reserve_capacity(&mut self) {


Do we need a MAX_HASHMAP_CAPACITY to keep this bounded? Or is it not an issue because we have MAX_PER_OBJECT_QUEUE_LENGTH & MAX_TM_QUEUE_LENGTH for protection

I thought about this, and decided to rely on the existing limits for protection, since the actual usage at most doubles these values, which should still be tolerable in a spike.

arun-koshy · 2023-04-13T03:10:29Z

crates/sui-core/src/authority.rs

-        // TransactionManager can receive the notifications for objects that it did not find
-        // in the objects table.
-        //
-        // REQUIRED: this must be called before tx_guard.commit_tx() (below), to ensure


ooc why is this no longer true?

tx_guard.commit_tx() is currently a no-op FYI

Right, and the existence of executed effects are checked when transaction manager recovers from pending_execution table now, so even if a txn did not call tx_guard.commit_tx() due to a crash gets retried, transaction manager would handle the case correctly by avoiding re-execution and removing the transaction from the pending_execution table.

mystenmark · 2023-04-13T04:06:11Z

crates/sui-core/src/transaction_manager.rs

+            self.executing_certificates.shrink_to(max(
+                self.executing_certificates.capacity() / 2,
+                MIN_HASHMAP_CAPACITY,
+            ))


Can you explain more about what memory problem this is addressing? It seems to me like the main problem would be peak usage, which this doesn't seem to help with.

Also, this code is quite repetive and could be factored with a generic function.

This tries to reduce memory usage after a spike of pending transactions, which can be much larger than the usual number of pending transactions. Will refactor this.

mystenmark · 2023-04-13T17:23:31Z

crates/sui-core/src/execution_driver.rs

@@ -103,15 +103,10 @@ pub async fn execution_process(
                    break;
                }
            }
-
-            // Remove the certificate that finished execution from the pending_certificates table.
-            authority.certificate_executed(&digest, &epoch_store);


why was this removed?

I combined certificate_executed() and objects_available() into a single notify_commit() call from authority state, to simplify reasoning, so only 1 notification is needed after a transaction commits instead of 2. Downside is that we can no longer tell if a committed transaction should have been tracked by transaction manager, because the commit notification will be triggered for system transactions too. I kept objects_available() though to be called from enqueue() and tests.

## Description 1. Fix a leak where lock_waiters map inserts an entry with empty LockQueue, that may not get removed. 2. Resize HashMaps in TM as load changes. 3. Only notify TM once after a transaction commits, with both tx digest and output keys. 4. Extract common logic to update lock queue and ready transactions after a transaction and its output objects commit. ## Test Plan unit tests --- If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process. ### Type of Change (Check all that apply) - [ ] user-visible impact - [ ] breaking change for a client SDKs - [ ] breaking change for FNs (FN binary must upgrade) - [ ] breaking change for validators or node operators (must upgrade binaries) - [ ] breaking change for on-chain data layout - [ ] necessitate either a data wipe or data migration ### Release notes

mwtian force-pushed the txn-mgr-hashmap branch 2 times, most recently from 6c1555c to 3882662 Compare April 12, 2023 20:43

mwtian changed the title ~~[TransactionManager] improve memory management~~ [TransactionManager] fix memory issues Apr 12, 2023

mwtian changed the title ~~[TransactionManager] fix memory issues~~ [TransactionManager] fix memory usage issues Apr 12, 2023

mwtian marked this pull request as ready for review April 12, 2023 20:52

mwtian changed the title ~~[TransactionManager] fix memory usage issues~~ [TransactionManager] refactor and fix memory usage issues Apr 12, 2023

mwtian requested review from andll, mystenmark and arun-koshy April 12, 2023 20:54

mwtian force-pushed the txn-mgr-hashmap branch from 3882662 to 38cd988 Compare April 13, 2023 02:48

arun-koshy approved these changes Apr 13, 2023

View reviewed changes

mystenmark reviewed Apr 13, 2023

View reviewed changes

mwtian force-pushed the txn-mgr-hashmap branch from 38cd988 to bd6b79f Compare April 13, 2023 04:54

mwtian added 3 commits April 12, 2023 23:01

Reserve and shrink capacity

1361b07

Refactor txn mgr

7ad184b

.

16ee9bc

mwtian force-pushed the txn-mgr-hashmap branch from bd6b79f to 16ee9bc Compare April 13, 2023 06:01

.

d3d633e

mwtian force-pushed the txn-mgr-hashmap branch from 8c5a385 to d3d633e Compare April 13, 2023 17:00

mystenmark reviewed Apr 13, 2023

View reviewed changes

mystenmark approved these changes Apr 13, 2023

View reviewed changes

mwtian enabled auto-merge (squash) April 13, 2023 20:13

mwtian merged commit bf757cc into main Apr 13, 2023

mwtian deleted the txn-mgr-hashmap branch April 13, 2023 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TransactionManager] refactor and fix memory usage issues #10829

[TransactionManager] refactor and fix memory usage issues #10829

mwtian commented Apr 12, 2023 •

edited

Loading

vercel bot commented Apr 12, 2023 •

edited

Loading

arun-koshy Apr 13, 2023

mwtian Apr 13, 2023

arun-koshy Apr 13, 2023

arun-koshy Apr 13, 2023

mwtian Apr 13, 2023

arun-koshy Apr 13, 2023

mystenmark Apr 13, 2023

mwtian Apr 13, 2023

mystenmark Apr 13, 2023

mwtian Apr 13, 2023

mystenmark Apr 13, 2023

mwtian Apr 13, 2023 •

edited

Loading

[TransactionManager] refactor and fix memory usage issues #10829

[TransactionManager] refactor and fix memory usage issues #10829

Conversation

mwtian commented Apr 12, 2023 • edited Loading

Description

Test Plan

Type of Change (Check all that apply)

Release notes

vercel bot commented Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwtian Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

mwtian commented Apr 12, 2023 •

edited

Loading

vercel bot commented Apr 12, 2023 •

edited

Loading

mwtian Apr 13, 2023 •

edited

Loading