Skip to content

Conversation

@eddyashton
Copy link
Member

Exploring behaviour with various corrupt ledger chunks spotted a significant regression from 5.x to 6.x - a truncation that caused a completely missing offsets table was previously (correctly) an error, would now seem to read correctly (albeit returning an empty file).

This adds 2 extra checks on file open, and some regression tests by manually corrupting some of our golden files.

@eddyashton eddyashton requested a review from a team as a code owner November 28, 2025 14:34
Copilot AI review requested due to automatic review settings November 28, 2025 14:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves validation during ledger chunk parsing to address a regression from version 5.x to 6.x. Previously, a truncated ledger chunk with a completely missing offsets table would incorrectly appear to read successfully (returning an empty file) instead of raising an error.

Key Changes:

  • Added validation to check that the offset table position claimed in the file header doesn't exceed the actual file size
  • Added validation to verify the number of transactions found in the file matches the expected count from the filename
  • Added comprehensive regression tests covering 9 different corruption scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
python/src/ccf/ledger.py Adds two new validation checks in LedgerChunk.__init__: verifies offset table position is within file bounds and transaction count matches filename expectations
tests/e2e_operations.py Adds regression tests that corrupt ledger chunks in 9 different ways and verify appropriate error messages are raised

Copy link
Member

@achamayou achamayou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but needs a Python format

@achamayou achamayou added auto-backport Automatically backport this PR to LTS branch 6.x-todo PRs which should be backported to 6.x labels Nov 28, 2025
@achamayou achamayou merged commit 52751f1 into microsoft:main Nov 28, 2025
39 checks passed
eddyashton added a commit to eddyashton/CCF that referenced this pull request Dec 2, 2025
@eddyashton eddyashton added the backported This PR was successfully backported to LTS branch label Dec 2, 2025
eddyashton added a commit that referenced this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.x-todo PRs which should be backported to 6.x auto-backport Automatically backport this PR to LTS branch backported This PR was successfully backported to LTS branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants