[Storing] Refactor Storing Collections #7736

zhangchiqing · 2025-08-14T22:06:59Z

This PR refactors storing collections to remove duplicated logic of storing transactions as well as refactor with lockctx manager.

See comments for the highlighted changes.

codecov-commenter · 2025-08-14T22:11:13Z

Codecov Report

❌ Patch coverage is 37.05947% with 1143 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
state/protocol/badger/state.go	64.08%	75 Missing and 41 partials ⚠️
cmd/util/cmd/read-protocol-state/cmd/blocks.go	0.00%	66 Missing ⚠️
cmd/util/cmd/read-protocol-state/cmd/snapshot.go	0.00%	61 Missing ⚠️
state/protocol/datastore/params.go	0.00%	51 Missing ⚠️
storage/mock/collections.go	0.00%	48 Missing ⚠️
storage/badger/init.go	48.78%	31 Missing and 11 partials ⚠️
cmd/util/cmd/read-badger/cmd/blocks.go	0.00%	40 Missing ⚠️
cmd/util/cmd/read-badger/cmd/cluster_blocks.go	0.00%	37 Missing ⚠️
cmd/util/cmd/common/storage.go	7.89%	35 Missing ⚠️
storage/operation/approvals.go	0.00%	33 Missing ⚠️
... and 72 more

📢 Thoughts on this report? Let us know!

zhangchiqing · 2025-08-15T16:45:52Z

module/state_synchronization/indexer/indexer_core.go

 	if err != nil {
-		// ignore collection if already seen
-		if errors.Is(err, storage.ErrAlreadyExists) {


StoreAndIndexByTransaction no longer return this error

zhangchiqing · 2025-08-15T16:46:51Z

module/state_synchronization/indexer/indexer_core.go

 		return err
 	}

-	// now store each of the transaction body
-	for _, tx := range collection.Transactions {


StoreAndIndexByTransaction will store the transaction internally, so that txs and collection can be saved in the same batch update.

…lection

zhangchiqing · 2025-08-18T20:32:34Z

storage/store/collections.go

-	// TODO(7355): lockctx
-	indexingByTx *sync.Mutex


Replaced with the lock manager

zhangchiqing · 2025-08-18T20:33:12Z

storage/store/collections.go

-			// produce multiple finalized collections (aka guaranteed collections) containing the same
-			// transaction repeadely.
-			// TODO: For now we log a warning, but eventually we need to handle Byzantine clusters
-			err = operation.RemoveTransaction(rw.Writer(), txID)


Replaced by transactions.RemoveBatch, so that the cache in transactions is also updated.

zhangchiqing · 2025-08-18T20:35:50Z

storage/store/collections.go

 		}
 	}

-	return nil
+	// Store individual transactions
+	for _, tx := range collection.Transactions {


Storing transactions used to be done outside of BatchStoreAndIndexByTransaction, because we want to reuse the light collection, the caused the logic to be duplicated.

Now this is refactored by moving storing txs inside of BatchStore, and the light collection can be returned for outer logic to use without being computed again.

jordanschalm · 2025-08-20T20:07:59Z

cmd/access/node_builder/access_node_builder.go

@@ -2153,6 +2154,7 @@ func (builder *FlowAccessNodeBuilder) Build() (cmd.Node, error) {
 				notNil(builder.collections),
 				notNil(builder.transactions),
 				lastFullBlockHeight,
+				storage.NewTestingLockManager(),


Suggested change

storage.NewTestingLockManager(),

node.StorageLockMgr,

⚠️ Yikes, we're using the testing-only lock manager in production code here! Maybe we need to give it a more scary name.

jordanschalm · 2025-08-20T20:12:50Z

cmd/observer/node_builder/observer_builder.go

@@ -1451,6 +1451,7 @@ func (builder *ObserverServiceBuilder) BuildExecutionSyncComponents() *ObserverS
 				builder.RootChainID.Chain(),
 				indexerDerivedChainData,
 				collectionExecutedMetric,
+				storage.NewTestingLockManager(),


Again here using the testing-only lock manager in production code. We need to make sure this is a very hard mistake to make going forward. Given that it has come up already, I think we need to revisit the protections we put around misuse of the testing lock manager constructor

jordanschalm · 2025-08-20T20:15:22Z

engine/access/access_test.go

@@ -785,7 +786,13 @@ func (suite *Suite) TestGetSealedTransaction() {
 		// 3. Request engine is used to request missing collection
 		suite.request.On("EntityByID", collection.ID(), mock.Anything).Return()
 		// 4. Indexer IndexCollection receives the requested collection and all the execution receipts
-		err = indexer.IndexCollection(collection, collections, transactions, suite.log, collectionExecutedMetric)
+		// Create a lock context for indexing
+		indexLctx := storage.NewTestingLockManager().NewContext()


We should re-use the lock manager already created in this test.

I think your idea of adding some utilities will help with this, like RunWithLockCtx(). I'm also wondering making a version of RunWithBadgerDB that instantiates the lock manager will help to avoid accidentally creating multiple lock managers within test cases.

jordanschalm · 2025-08-20T20:21:38Z

engine/access/access_test.go

@@ -990,7 +998,13 @@ func (suite *Suite) TestGetTransactionResult() {
 			ingestEng.OnFinalizedBlock(mb)

 			// Indexer IndexCollection receives the requested collection and all the execution receipts
-			err = indexer.IndexCollection(collection, collections, transactions, suite.log, collectionExecutedMetric)
+			// Create a lock context for indexing
+			indexLctx := storage.NewTestingLockManager().NewContext()


First lock manager for this test case created here: https://github.com/onflow/flow-go/pull/7736/files#diff-7b89d8add60bc70809a38bbadd3a6ab68d0982630735a24aba8597c7de8bc958R960. Should only have one per test

jordanschalm · 2025-08-20T20:26:21Z

engine/access/ingestion/collection_syncer.go

-	err := indexer.IndexCollection(collection, s.collections, s.transactions, s.logger, s.collectionExecutedMetric)
+	// Create a lock context for indexing
+	lctx := s.lockManager.NewContext()
+	err := lctx.AcquireLock(storage.LockInsertCollection)


AcquireLock can return either ErrPolicyViolation or UnknownLockError. Both of these should be considered critical at this point, so we should consider this as an exception, not logging and continuing.

I see that this component isn't structured to throw irrecoverable errors, but I think it's better to log.Fatal() and add a TODO noting that this component should be using irrecoverable.Context.

jordanschalm · 2025-08-20T21:00:11Z

storage/store/collections.go

-// transaction IDs) and adds a transaction id index for each of the
-// transactions within the collection (transaction_id->collection_id).
-//
+// StoreLightAndIndexByTransaction stores a light collection and indexes it by transaction ID.


Suggested change

// StoreLightAndIndexByTransaction stores a light collection and indexes it by transaction ID.

// StoreAndIndexByTransaction stores a light collection and indexes it by transaction ID.

jordanschalm · 2025-08-20T21:01:24Z

storage/store/collections.go

@@ -199,7 +207,7 @@ func (c *Collections) batchStoreLightAndIndexByTransaction(collection *flow.Ligh
 // already exists.
 //
 // No errors are expected during normal operation.
-func (c *Collections) StoreLightAndIndexByTransaction(collection *flow.LightCollection) error {
+func (c *Collections) StoreAndIndexByTransaction(lctx lockctx.Proof, collection *flow.Collection) (flow.LightCollection, error) {
 	// - This lock is to ensure there is no race condition when indexing collection by transaction ID


I think this whole block of comments is outdated (lines 212-222)

jordanschalm · 2025-08-20T21:04:00Z

storage/store/collections_test.go

+		lockManager := storage.NewTestingLockManager()
+		lctx := lockManager.NewContext()
+		err := lctx.AcquireLock(storage.LockInsertCollection)


unittest.LockManagerWithContext saves us a few lines in these kinds of usages

jordanschalm · 2025-08-20T21:20:46Z

storage/store/inmemory/unsynchronized/collections.go

+func (c *Collections) StoreAndIndexByTransaction(_ lockctx.Proof, collection *flow.Collection) (flow.LightCollection, error) {
 	c.lock.Lock()
 	defer c.lock.Unlock()


[no action required in this PR]

This component is pretty confusing now, concurrency-wise:

it's in a package called unsynchronized

but it has and uses a mutex

and it accepts a lock context (because we want to implement the storage interface for these components)

but the lock context is ignored

unsynchronized.Collections is essentially a mempool -- it's not clear to me why we're having it implement the storage interface. I took a look through the usages, and it seems like in all cases, we are referencing it by its concrete type rather than as an abstract implementation of the interface. My best guess is that kind of usage was the original reason for building it in this way. But given that we are only referencing it as a concrete type, maybe we should consider adopting the existing mempool for this purpose.

This doesn't really have anything to do with Pebble or this PR, so I'll add it as an item to #7682 and we can ask @peterargue when he's back.

jordanschalm · 2025-08-20T21:23:26Z

storage/store/inmemory/unsynchronized/collections_test.go

+	// Create a no-op lock context for testing
+	noOpLockCtx := &noOpLockContext{}
+	_, err := collections.StoreAndIndexByTransaction(noOpLockCtx, &collection)


Suggested change

// Create a no-op lock context for testing

noOpLockCtx := &noOpLockContext{}

_, err := collections.StoreAndIndexByTransaction(noOpLockCtx, &collection)

_, err := collections.StoreAndIndexByTransaction(nil, &collection) // lock context should be ignored by non-DB storage backend

Since we expect the implementation to completely ignore the argument, I'd pass in nil instead. That way if it isn't ignoring it, we'll notice.

zhangchiqing added 14 commits August 13, 2025 15:40

refactor collection store methods

8de6d79

fix tests

b07a52f

add lock manager

a8a0aa9

add lockctx to store collections

73158f8

fix tests

b4cd3a8

fix tests

59d5602

add mock

30fcb2c

fix tests

707d6e6

fix tests

03504cb

fix tests

4a2e3d8

fix tests

ee87b1a

fix in memory store for collection

ec33a49

fix lint

883497f

remove used struct

cc88624

zhangchiqing temporarily deployed to internal-ci August 14, 2025 22:07 — with GitHub Actions Inactive

zhangchiqing mentioned this pull request Aug 14, 2025

[Storage] Refactor storage for Follower Engine #7262

Open

zhangchiqing added 5 commits August 15, 2025 07:23

fix light collections issue

51a91be

fix storing collection

2240fb8

refactor unsynchronized collections

6de29ec

fix tests

922d5c6

fix tests

7a847c9

zhangchiqing temporarily deployed to internal-ci August 15, 2025 14:59 — with GitHub Actions Inactive

fix store collections

21f7a67

zhangchiqing temporarily deployed to internal-ci August 15, 2025 16:05 — with GitHub Actions Inactive

zhangchiqing changed the base branch from master to leo/add-block-view-index August 15, 2025 16:44

zhangchiqing commented Aug 15, 2025

View reviewed changes

zhangchiqing added 2 commits August 15, 2025 12:08

remove locks

74413b8

Merge branch 'leo/add-block-view-index' into leo/refactor-storing-col…

50484ad

…lection

zhangchiqing marked this pull request as ready for review August 18, 2025 20:31

zhangchiqing requested a review from a team as a code owner August 18, 2025 20:31

zhangchiqing commented Aug 18, 2025

View reviewed changes

zhangchiqing requested a review from jordanschalm August 18, 2025 20:46

jordanschalm reviewed Aug 20, 2025

View reviewed changes

jordanschalm mentioned this pull request Aug 20, 2025

Badger -> Pebble: remaining tasks and cleanup #7682

Open

36 tasks

	// StoreLightAndIndexByTransaction stores a light collection and indexes it by transaction ID.
	// StoreAndIndexByTransaction stores a light collection and indexes it by transaction ID.

[Storing] Refactor Storing Collections #7736

Are you sure you want to change the base?

[Storing] Refactor Storing Collections #7736

Conversation

zhangchiqing commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhangchiqing commented Aug 14, 2025 •

edited

Loading

codecov-commenter commented Aug 14, 2025 •

edited

Loading