[Dynamic Protocol State] TODOs and refactoring, part 2 #5080

durkmurder · 2023-11-29T18:58:44Z

Context

In this PR I am aiming to resolve two mid priority TODOs from the list.

Extend the unit tests for RichProtocolStateEntry by also testing extra scenarios
For the ProtocolState storage layer abstraction I would suggest to also include a cache for the secondary index (retrieving IdentityTable by block ID) 👉 PR comment

…k id

codecov-commenter · 2023-11-29T19:05:31Z

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (7587d3a) 56.37% compared to head (66b1bab) 57.57%.

Files	Patch %	Lines
storage/badger/protocol_state.go	56.25%	11 Missing and 3 partials ⚠️
cmd/scaffold.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@                        Coverage Diff                         @@
##           feature/dynamic-protocol-state    #5080      +/-   ##
==================================================================
+ Coverage                           56.37%   57.57%   +1.19%     
==================================================================
  Files                                 987      745     -242     
  Lines                               92682    72117   -20565     
==================================================================
- Hits                                52254    41519   -10735     
+ Misses                              36570    27436    -9134     
+ Partials                             3858     3162     -696

Flag	Coverage Δ
unittests	`57.57% <55.55%> (+1.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jordanschalm · 2023-11-30T01:26:34Z

storage/badger/protocol_state.go

+			if err != nil {
+				return nil, fmt.Errorf("could not lookup identity table ID for block (%x): %w", blockID[:], err)
+			}
+			return cache.Get(protocolStateID)(tx)


I have a minor preference for moving this second cache Get call outside of the byBlockID cache retrieval function (and into ByBlockID). That way each cache remains conceptually only a wrapper around one database call, rather than containing glue logic linking different database calls together. It's also easier to add things like a public ProtocolStateIDByBlockID method (similar to how we have the BlockIDByHeight method that just reads the secondary index for finalized blocks).

Do you mean changing signature of secondary cache to Cache[flow.Identifier, flow.Identifier] which will map block_id -> protocol_state_id?

changing signature of secondary cache to Cache[flow.Identifier, flow.Identifier] which will map block_id -> protocol_state_id

Yeah, exactly

• addressed remaining TODOs for documentation and removed outdated TODOs • consolidated and updated goDoc of `storage.ProtocolState` interface and implementation • added logic for populating the `byBlockIdCache` cache, i.e. the cache for looking up a block's Protocol state

AlexHentschel

Looks good. A few minor comments.

Tried to help with consolidating the goDoc and with some minor polishing in my PR #5116 targeting your branch. The only comment that I did not already implement is this one: #5080 (comment)

AlexHentschel · 2023-12-06T22:04:00Z

storage/badger/protocol_state.go

-	defer tx.Discard()
-	return s.byBlockID(blockID)(tx)
-}
-
 // byID retrieves the identity table by its ID. Error returns:
 //   - storage.ErrNotFound if no identity table with the given ID exists
 func (s *ProtocolState) byID(protocolStateID flow.Identifier) func(*badger.Txn) (*flow.RichProtocolStateEntry, error) {
 	return s.cache.Get(protocolStateID)


I would suggest to just inline this one line in the two places, where byID is called.

👉 Implemented in my PR #5116

AlexHentschel · 2023-12-07T05:14:57Z

storage/badger/protocol_state.go

I noticed that there are still some open and outdated TODOs here regarding documentation. Furthermore, the methods' goDoc differers quite a bit between the interface storage.ProtocolState and this implementation. Lastly, we are still using the term "Identity Table" despite the protocol state now being a lot more general.

I put up PR #5116 targeting your branch, which consolidates the goDoc, address the remaining open TODOs, and removes the outdated TODOs.

AlexHentschel · 2023-12-07T05:43:48Z

storage/badger/protocol_state.go

I agree with not populating the cache which holds the RichProtocolStateEntrys on store. This is because (i) we don't have the RichProtocolStateEntry on store readily available and (ii) new RichProtocolStateEntry are really rare throughout an epoch, so the total cost of populating the cache becomes negligible over several views.

Side comment (outside the scope of this PR):

I think we could have the State Machine's Build method generate the RichProtocolStateEntry right away. I think it already has the needed Epoch Setup and Epoch Commit events, since it starts with a RichProtocolStateEntry for the parent state and consumes Epoch Setup and Epoch Commit events.

I think we might want to implement this, if we want to store more readily changing information in the protocol state, like the latest sealed block.

Though, I think for the scope of this PR, it would be beneficial to populate the byBlockIdCache on store, because here, we add a new entry every block! And we probably query for every block. So argument (ii) does not really apply here. Furthermore, argument (i) also does not apply, because we already have the Protocol State's ID on store, so we could populate the cache without much additional effort.

I implemented this in my PR #5116.

AlexHentschel · 2023-12-07T06:10:54Z

storage/badger/protocol_state.go

Currenlty, we are using the same cache size for both caches

flow-go/storage/badger/protocol_state.go

Line 67 in e22ad8b

withLimit[flow.Identifier, *flow.RichProtocolStateEntry](cacheSize),

flow-go/storage/badger/protocol_state.go

Line 71 in e22ad8b

withLimit[flow.Identifier, flow.Identifier](cacheSize),

I don't think that is a good idea for the following reason:

byBlockIdCache will contain an entry for every block. We want to be able to cover a broad interval of views without cache misses, so I like the default setting of allowing up to 1000 entries.

However, cache only holds the distinct Protocol States. Minimally, we have something like 3 entries per epoch (one on epoch Switchover, one on receiving the Epoch Setup and one when seeing the Epoch Commit event). Lets be generous and assume we have 20 different Protocol States per epoch. Beyond that, we are certainly leaving the domain of normal operations that we optimize for. That would mean we are holding the protocol states for 1 year in the cache. That doesn't seem useful to me.

I would suggest to have a dedicated size parameter for each cache.

AlexHentschel · 2023-12-07T06:15:09Z

model/flow/protocol_state_test.go

+			entry.NextEpochCommit = nil
+			entry.NextEpoch.CommitID = flow.ZeroID
+		})
+


the following lines are sanity checks for the previously-constructed stateEntry, correct?

flow-go/model/flow/protocol_state_test.go

Lines 141 to 143 in e22ad8b

assert.Nil(t, stateEntry.PreviousEpoch)

assert.Nil(t, stateEntry.PreviousEpochSetup)

assert.Nil(t, stateEntry.PreviousEpochCommit)

Would suggest to include a comment:

Suggested change

// sanity check that previous epoch is not populated in `stateEntry`

👉 Implemented in my PR #5116

yeah, this is santity checks to ensure we are testing the correct thing

AlexHentschel · 2023-12-07T06:19:46Z

model/flow/protocol_state_test.go

+			entry.PreviousEpochCommit = nil
+			entry.PreviousEpoch = nil
+		})
+


Suggested change

// sanity check that previous epoch is not populated in `stateEntry`

👉 Implemented in my PR #5116

…nd-refactoring-part-2

durkmurder added 2 commits November 29, 2023 16:12

Added tests to cover a few cases when bootstraping after spork

cd22475

Added a cache for secondary index for querying protocol state by bloc…

475574d

…k id

durkmurder requested review from jordanschalm and AlexHentschel November 29, 2023 18:58

durkmurder assigned jordanschalm and AlexHentschel Nov 29, 2023

jordanschalm approved these changes Nov 30, 2023

View reviewed changes

Updated secondary cache to store ids instead of entries

e22ad8b

jordanschalm mentioned this pull request Dec 4, 2023

[Dynamic Protocol State] Remove EpochStatus #5089

Merged

AlexHentschel mentioned this pull request Dec 7, 2023

Suggestions for PR #5080 #5116

Merged

Alexander Hentschel added 2 commits December 6, 2023 21:56

added documentation of cache population

41b8362

added comments on recommended cache size

d9fb53d

AlexHentschel approved these changes Dec 7, 2023

View reviewed changes

Alexander Hentschel and others added 5 commits December 6, 2023 22:21

minor comments for test

815ba07

Merge pull request #5116 from onflow/alex/PR-5080_-_suggestions

6c81ef2

Applied PR suggestions regarding cache size

337c066

Merge branch 'feature/dynamic-protocol-state' into yurii/4649-todos-a…

f89f5eb

…nd-refactoring-part-2

Updated mocks

66b1bab

durkmurder merged commit 7fdc7fb into feature/dynamic-protocol-state Dec 8, 2023
53 checks passed

durkmurder deleted the yurii/4649-todos-and-refactoring-part-2 branch December 8, 2023 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dynamic Protocol State] TODOs and refactoring, part 2 #5080

[Dynamic Protocol State] TODOs and refactoring, part 2 #5080

durkmurder commented Nov 29, 2023

codecov-commenter commented Nov 29, 2023 •

edited

Loading

jordanschalm Nov 30, 2023 •

edited

Loading

durkmurder Dec 1, 2023

jordanschalm Dec 1, 2023

durkmurder Dec 4, 2023

AlexHentschel left a comment •

edited

Loading

AlexHentschel Dec 6, 2023 •

edited

Loading

AlexHentschel Dec 7, 2023

AlexHentschel Dec 7, 2023

AlexHentschel Dec 7, 2023

durkmurder Dec 7, 2023

AlexHentschel Dec 7, 2023 •

edited

Loading

durkmurder Dec 7, 2023

AlexHentschel Dec 7, 2023 •

edited

Loading

	assert.Nil(t, stateEntry.PreviousEpoch)
	assert.Nil(t, stateEntry.PreviousEpochSetup)
	assert.Nil(t, stateEntry.PreviousEpochCommit)


	// sanity check that previous epoch is not populated in `stateEntry`

[Dynamic Protocol State] TODOs and refactoring, part 2 #5080

[Dynamic Protocol State] TODOs and refactoring, part 2 #5080

Conversation

durkmurder commented Nov 29, 2023

Context

codecov-commenter commented Nov 29, 2023 • edited Loading

Codecov Report

jordanschalm Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel left a comment • edited Loading

Choose a reason for hiding this comment

AlexHentschel Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Nov 29, 2023 •

edited

Loading

jordanschalm Nov 30, 2023 •

edited

Loading

AlexHentschel left a comment •

edited

Loading

AlexHentschel Dec 6, 2023 •

edited

Loading

AlexHentschel Dec 7, 2023 •

edited

Loading

AlexHentschel Dec 7, 2023 •

edited

Loading