Skip to content

Conversation

@amesgen
Copy link
Member

@amesgen amesgen commented May 19, 2025

Closes #1515

The first three commits add a regression test by enrichting the existing LedgerDB state machine test, making sure that

  total number of all opened handles
- total number of closed handles
= states on the LedgerSeq in the end

The fourth commit then fixes the bug, causing the regression test to succeed.

Note that this bug fix reveals another bug #1551 (causing the mock-test failure for LeaderSchedule.simple convergence), which we silence by temporarily making the closing of an in-memory handle a no-op (as it is not necessary anyways, the GC will still collect the in-memory ledger state).

Comment on lines +157 to +166
closeDiscarded = do
closePruned
-- Do /not/ close the anchor of @toClose@, as that is also the
-- tip of @olddb'@ which will be used in @newdb@.
case toClose of
AS.Empty _ -> pure ()
_ AS.:< closeOld' -> closeLedgerSeq (LedgerSeq closeOld')
-- Finally, close the anchor of @lseq@ (which is a duplicate of
-- the head of @olddb'@).
close $ tables $ AS.anchor lseq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, but it is rather non-obvious.

It would be "nicer" if newdb would use the anchor state of lseq instead of the head state of olddb' (but we would need a special purpose AS.join for this as this isn't possible in general when the anchor is "strictly smaller" than the element type). In that case, we could simply write

closeDiscarded = closePruned *> closeLedgerSeq (LedgerSeq closeOld)

instead of what I have here.

Copy link
Contributor

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice catch, Alex. The fix looks very reasonable.

Copy link
Contributor

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No negative comments. I think we can merge this if you want.

@jasagredo jasagredo moved this from 🏗 In progress to 👀 In review in Consensus Team Backlog Jun 5, 2025
@amesgen amesgen force-pushed the amesgen/v2-ledgerseq-close branch from 0c5b137 to 7900088 Compare June 5, 2025 11:28
@amesgen amesgen enabled auto-merge June 5, 2025 11:29
@amesgen amesgen added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@amesgen amesgen added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@amesgen amesgen added this pull request to the merge queue Jun 5, 2025
Merged via the queue into main with commit 9468967 Jun 5, 2025
35 of 43 checks passed
@amesgen amesgen deleted the amesgen/v2-ledgerseq-close branch June 5, 2025 21:18
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Consensus Team Backlog Jun 5, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jun 11, 2025
Closes #1551

The commits are intended to be reviewed individually.

- The first commit fixes the race condition described in #1551 by using
the `ldbOpenHandlesLock` to duplicate the handle when snapshotting.
- The second commit reverts 7900088
added in #1516, enabled by the first commit. Without the first commit,
this would cause test failures, see the commit description.
- The third and fourth commits are purely refactorings to be able to
reuse the pattern of "duplicating while holding a read lock".
- The fifth commit uses the same pattern for `takeSnapshotNOW`, a
testing function, see the commit description.
 - The last commit adds documentation for `ldbOpenHandlesLock`.
amesgen added a commit that referenced this pull request Jun 30, 2025
Closes #1515

The first three commits add a regression test by enrichting the existing
LedgerDB state machine test, making sure that
```
  total number of all opened handles
- total number of closed handles
= states on the LedgerSeq in the end
```
The fourth commit then fixes the bug, causing the regression test to
succeed.

Note that this bug fix reveals another bug #1551 (causing the
`mock-test` failure for `LeaderSchedule.simple convergence`), which we
silence by temporarily making the closing of an in-memory handle a no-op
(as it is not *necessary* anyways, the GC will still collect the
in-memory ledger state).
amesgen added a commit that referenced this pull request Jun 30, 2025
Closes #1551

The commits are intended to be reviewed individually.

- The first commit fixes the race condition described in #1551 by using
the `ldbOpenHandlesLock` to duplicate the handle when snapshotting.
- The second commit reverts 7900088
added in #1516, enabled by the first commit. Without the first commit,
this would cause test failures, see the commit description.
- The third and fourth commits are purely refactorings to be able to
reuse the pattern of "duplicating while holding a read lock".
- The fifth commit uses the same pattern for `takeSnapshotNOW`, a
testing function, see the commit description.
 - The last commit adds documentation for `ldbOpenHandlesLock`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

LedgerDB V2: handles aren't closed in the common case

3 participants