-
Notifications
You must be signed in to change notification settings - Fork 247
Improve validation during read_ledger chunk parsing
#7492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validation during read_ledger chunk parsing
#7492
Conversation
This reverts commit d27bbab.
…ger_offset_table_validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves validation during ledger chunk parsing to address a regression from version 5.x to 6.x. Previously, a truncated ledger chunk with a completely missing offsets table would incorrectly appear to read successfully (returning an empty file) instead of raising an error.
Key Changes:
- Added validation to check that the offset table position claimed in the file header doesn't exceed the actual file size
- Added validation to verify the number of transactions found in the file matches the expected count from the filename
- Added comprehensive regression tests covering 9 different corruption scenarios
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/src/ccf/ledger.py |
Adds two new validation checks in LedgerChunk.__init__: verifies offset table position is within file bounds and transaction count matches filename expectations |
tests/e2e_operations.py |
Adds regression tests that corrupt ledger chunks in 9 different ways and verify appropriate error messages are raised |
achamayou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but needs a Python format
(cherry picked from commit 52751f1)
Exploring behaviour with various corrupt ledger chunks spotted a significant regression from
5.xto6.x- a truncation that caused a completely missing offsets table was previously (correctly) an error, would now seem to read correctly (albeit returning an empty file).This adds 2 extra checks on file open, and some regression tests by manually corrupting some of our golden files.