Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(store/v2): Add Pruning Tests & Fix SQLite & PebbleDB Pruning #18459

Merged
merged 27 commits into from
Nov 17, 2023

Conversation

alexanderbez
Copy link
Contributor

@alexanderbez alexanderbez commented Nov 13, 2023

Description

  • Add more pruning test coverage, specifically with queries over ranged deletes
  • Fix the SQLite pruning logic
  • Implement pruning support for PebbleDB (taken from Sei's approach -- credit Kartik)

closes: #18462


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • run make lint and make test
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

Summary by CodeRabbit

  • New Features

    • Implemented database pruning to remove old versions of data, enhancing performance and reducing storage requirements.
    • Added synchronization options for database operations to ensure data integrity.
  • Improvements

    • Improved database version management to prevent access to pruned or non-existent versions, increasing reliability.
    • Enhanced error handling with a new error type for pruned versions, providing clearer feedback to users.
  • Bug Fixes

    • Fixed issues with database iterators to correctly handle pruned versions, ensuring accurate data retrieval.
    • Corrected error messages to reflect the current state of data versions.
  • Testing

    • Expanded test coverage to include scenarios for database pruning and version checks.
  • Documentation

    • Updated error documentation to include new error types and codes related to version pruning.

Copy link
Contributor

coderabbitai bot commented Nov 13, 2023

## Walkthrough

The changes across various database implementations (PebbleDB, RocksDB, SQLite) focus on introducing and handling a new concept of pruning data. A `pruneHeightKey` is used to track the earliest version of data that should be retained, with older data being pruned. This involves adding checks against the `earliestVersion` to prevent access to pruned data, updating error messages to reflect pruned versions, and modifying tests to account for the new pruning behavior. The changes aim to improve database performance and manage storage more efficiently.

## Changes

| File Path | Change Summary |
|-----------|----------------|
| `store/storage/pebbledb/db.go` <br> `store/storage/rocksdb/db.go` <br> `store/storage/sqlite/db.go` | Introduced pruning logic, added `earliestVersion` field, and updated methods to handle pruned versions. |
| `store/storage/pebbledb/db_test.go` <br> `store/storage/sqlite/db_test.go` <br> `store/storage/storage_test_suite.go` | Updated tests to handle new synchronization settings and pruning logic, including changes to assertions and removal of time logging. |
| `store/storage/pebbledb/iterator.go` <br> `store/storage/sqlite/iterator.go` | Added checks for `earliestVersion` in iterators to invalidate them if they target pruned versions. |
| `store/storage/sqlite/batch.go` | Updated SQL execution call from `latestVersionStmt` to `reservedUpsertStmt`. |
| `store/errors.go` | Changed error message for code 11 to "version pruned" and introduced a new error type `ErrVersionPruned`. |

> In databases deep, changes take a leap,
> Pruning old data, errors to keep.
> As seasons turn and history unfolds,
> Efficiency and joy, the code beholds.

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

@alexanderbez alexanderbez changed the title feat(store/v2): Add Pruning Tests & Fix SQLite feat(store/v2): Add Pruning Tests & Fix SQLite & PebbleDB Pruning Nov 15, 2023
@alexanderbez alexanderbez marked this pull request as ready for review November 15, 2023 20:04
@alexanderbez alexanderbez requested a review from a team as a code owner November 15, 2023 20:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5f36ad0 and cd14557.
Files selected for processing (9)
  • store/storage/pebbledb/db.go (11 hunks)
  • store/storage/pebbledb/db_test.go (1 hunks)
  • store/storage/pebbledb/iterator.go (1 hunks)
  • store/storage/rocksdb/db.go (6 hunks)
  • store/storage/sqlite/batch.go (1 hunks)
  • store/storage/sqlite/db.go (9 hunks)
  • store/storage/sqlite/db_test.go (4 hunks)
  • store/storage/sqlite/iterator.go (3 hunks)
  • store/storage/storage_test_suite.go (2 hunks)
Files skipped from review due to trivial changes (1)
  • store/storage/sqlite/db_test.go
Additional comments: 32
store/storage/pebbledb/db_test.go (1)
  • 12-25: The update to the NewDB function within the test suite to disable synchronization is a common practice for speeding up tests that do not require the durability guarantees of disk synchronization. This change should be verified to ensure it does not affect any tests that rely on the synchronization behavior.
store/storage/sqlite/batch.go (1)
  • 67-71: The change from latestVersionStmt to reservedUpsertStmt should be verified to ensure that the SQL statement executed aligns with the intended logic for the Write method. This change could have implications on how the latestVersion is managed within the database, potentially affecting the pruning logic and data integrity.

Ensure that reservedUpsertStmt is correctly defined elsewhere in the code and that it performs the expected operation. Also, verify that reservedStoreKey and keyLatestHeight are properly defined and used.

store/storage/sqlite/iterator.go (3)
  • 25-32: The addition of a check for targetVersion against db.earliestVersion is a good practice to prevent the iterator from accessing pruned data. This ensures that the iterator respects the pruning logic and does not return invalid data.

  • 60-66: The SQL statement preparation is done correctly using placeholders to prevent SQL injection. This is a critical security measure to protect against injection attacks.

  • 101-110: The Close method has been updated to check if itr.statement is not nil before attempting to close it. This is a good practice to avoid potential nil pointer dereferences.

store/storage/storage_test_suite.go (1)
  • 482-532: The test case TestDatabase_Prune_KeepRecent is well-structured and covers the scenario of pruning while ensuring that data written after the prune point remains accessible. It's important to ensure that the pruning functionality is thoroughly tested across all database backends to prevent data loss or corruption. This test case seems to be a valuable addition to the test suite.
store/storage/pebbledb/iterator.go (2)
  • 32-43: The addition of the earliestVersion parameter and the logic to return an invalid iterator if the version is less than earliestVersion is a good approach to handle attempts to iterate over pruned data. This change ensures that the iterator respects the pruning logic and does not return data that has been deleted.

  • 47-47: The handling of reverse iteration by using src.Last() is correct, but it's important to note that the comment on lines 8-10 states that reverse iteration is not supported. If reverse iteration is now being supported, the comment should be updated to reflect this change. If not, then the logic for reverse iteration should be removed or commented out with an explanation.

store/storage/rocksdb/db.go (5)
  • 7-13: The import of the "strings" package is added to handle error messages. Ensure that this is the only place where string manipulation is required, and consider if there are any other string operations that could benefit from this import to justify its inclusion.

  • 33-36: The removal of the tsLow field from the Database struct simplifies the pruning logic by relying on error handling during read operations instead of manual tracking. This change should be cross-checked with all usages of the Database struct to ensure that no other parts of the code rely on the tsLow field.

  • 66-78: The getSlice function has been updated to handle cases where a query for a pruned version does not return an error. This change assumes that the error message contains a specific string. It's important to ensure that this string matching is reliable and that the error message format will not change in future versions of the underlying database library, as this could lead to silent failures.

  • 102-121: The Has and Get functions have been updated to remove the check for version < db.tsLow and to handle cases where a query for a pruned version does not return an error. This change is consistent with the removal of tsLow and the new error handling strategy. Ensure that all error paths are correctly handled and that the absence of an error correctly indicates the absence of a value due to pruning.

  • 156-160: The Prune function has been updated to remove the update of db.tsLow. This change is consistent with the removal of the tsLow field and the new pruning strategy. Ensure that the new pruning logic is thoroughly tested, especially since it relies on the behavior of the underlying database library to handle pruned versions correctly.

store/storage/sqlite/db.go (7)
  • 18-26: The addition of keyPruneHeight constant is consistent with the existing pattern of defining key constants. This will be used to track the pruning height within the database.

  • 69-87: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [44-85]

The Database struct has been updated to include an earliestVersion field, which is initialized in the New function using the getPruneHeight helper function. This is a critical change for the pruning logic, ensuring that the database knows the earliest version it can serve. The earliestVersion is set to pruneHeight + 1, which assumes that pruneHeight is the last pruned version and the database should serve versions greater than this.

  • 113-119: The SetLatestVersion method has been updated to use reservedUpsertStmt for SQL execution. This change is consistent with the pattern of using prepared statements for database operations, which can help prevent SQL injection attacks and improve performance.

  • 154-163: The Get method now includes a check against db.earliestVersion to ensure that it does not return data that has been pruned. This is a critical addition for the correctness of the pruning functionality.

  • 236-240: The Iterator method's implementation looks correct. It checks for invalid start and end ranges and creates a new iterator with the provided parameters.

  • 248-252: The ReverseIterator method mirrors the Iterator method but specifies that the iterator should be in reverse order. The checks and the creation of the iterator are consistent with the forward iterator.

  • 285-307: The getPruneHeight function is a new utility function to retrieve the prune height from the database. It uses a prepared statement to query the keyPruneHeight from the state_storage table. The error handling is consistent with the rest of the codebase, including handling sql.ErrNoRows to indicate a fresh database with no prune height set yet.

store/storage/pebbledb/db.go (12)
  • 16-26: The constants and global variables are well-defined and follow the naming conventions. However, ensure that the comments are accurate and that the lexical ordering note is indeed correct and necessary for the system's logic.

  • 30-39: The Database struct is updated to include the earliestVersion field, which is essential for the pruning logic. The comment provides a good explanation of the sync field's purpose. This change seems appropriate for the new functionality.

  • 57-69: The error handling in the New function is correct, and the use of fmt.Errorf to wrap the underlying error is a good practice for error tracing. The retrieval of the pruneHeight and setting of the earliestVersion to pruneHeight + 1 is logical, assuming that pruneHeight represents the last pruned version.

  • 95-100: The SetLatestVersion method is concise and correct. It uses little-endian encoding to store the version number, which is consistent with the rest of the codebase.

  • 117-127: The setPruneHeight method correctly updates the earliestVersion and persists the pruneVersion using little-endian encoding. The use of db.sync in the write options ensures that the sync behavior is consistent with the rest of the database operations.

  • 138-145: The Get method now includes a check to ensure that the targetVersion is not less than the earliestVersion, which is a crucial addition for the pruning logic. The error handling and the return of nil when the targetVersion is pruned are correct.

  • 164-168: The error handling for decoding the tombstone is correct. The comment explaining the tombstone logic is clear and helps in understanding the code's intent.

  • 196-277: The Prune method has been significantly updated. The batch deletion logic and the use of PruneCommitBatchSize for committing in batches are good for performance. The error handling and the use of db.sync for commit operations are consistent with the rest of the database operations. The final call to db.setPruneHeight(version) ensures that the earliestVersion is updated after pruning.

  • 297-300: The Iterator method correctly creates a new iterator with the given bounds and version constraints. The use of db.earliestVersion ensures that the iterator respects the pruning logic.

  • 321-324: The ReverseIterator method mirrors the Iterator method in terms of logic and error handling, which is consistent and correct.

  • 332-372: The utility functions storePrefix, prependStoreKey, getPruneHeight, and valTombstoned are correctly implemented. The valTombstoned function includes a panic for an invalid MVCC value, which is a strong choice for a critical error that should not occur under normal circumstances. However, ensure that this panic will not cause undue service disruption and that it is handled or logged appropriately in the broader system context.

  • 375-376: The increment of version by 1 to make the end domain exclusive is correct. This is a common pattern in range queries to include the start key and exclude the end key.

store/storage/storage_test_suite.go Show resolved Hide resolved
store/storage/sqlite/db.go Outdated Show resolved Hide resolved
return New(dir)
db, err := New(dir)
if err == nil && db != nil {
db.SetSync(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in most environments this would be true, is there a reason to setting this to false for testing?

Copy link
Contributor Author

@alexanderbez alexanderbez Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just to make the unit tests faster. Not a strong reason, but it helps speed up CI testing time. (11.994s vs 1.214s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing!! could you add a short comment saying this. Would help for future readers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit unrelated to this pr, but for pebbledb and sqllite it could make sense to have a current state table so we can get, set and iterate faster for current state as state grows on disk.

for sqllite we could do something like each module has its own table for current state and archived data is a single table. Somewhat different than iavlv2 but in the same vein

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between b85d911 and 8fc7e93.
Files selected for processing (6)
  • store/errors.go (1 hunks)
  • store/storage/pebbledb/db.go (11 hunks)
  • store/storage/rocksdb/db.go (6 hunks)
  • store/storage/sqlite/db.go (9 hunks)
  • store/storage/sqlite/db_test.go (5 hunks)
  • store/storage/storage_test_suite.go (3 hunks)
Files skipped from review due to trivial changes (2)
  • store/errors.go
  • store/storage/sqlite/db_test.go
Additional comments: 15
store/storage/storage_test_suite.go (4)
  • 192-195: The update to the TestDatabase_IteratorClose test function is good as it now checks for the expected behavior after the iterator is closed, which is that iter.Valid() should return false. This is a more appropriate test than checking for a panic, as the iterator should be designed to handle closure gracefully.

  • 456-466: The changes in the TestDatabase_Prune function improve the test's robustness by checking that after pruning, attempts to access versions that have been pruned result in an error and that the data is no longer accessible. This aligns with the expected behavior of the pruning process.

  • 478-483: The continuation of the TestDatabase_Prune function further asserts that after pruning the latest version, all data should be inaccessible, which is the expected behavior when the entire dataset is pruned.

  • 487-533: The addition of the TestDatabase_Prune_KeepRecent function is a valuable test case that ensures pruning does not affect data versions that are newer than the pruned version. It checks that data before the prune point is inaccessible, while data after the prune point remains accessible. This is a critical test for ensuring the integrity of the pruning process.

store/storage/rocksdb/db.go (4)
  • 7-13: The addition of the "strings" import is necessary for the new error handling logic in the getSlice function. This change is appropriate if the rest of the codebase does not use "strings" in this file.

  • 41-56: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [33-54]

The removal of the tsLow field from the Database struct and its associated logic in the New and NewWithDB functions aligns with the summary stating that manual tracking of tsLow is eliminated. This should simplify the codebase and reduce error rates as long as all other references to tsLow have been removed and the new error checking logic is robust.

  • 100-119: The Has and Get functions have been updated to remove the check for version < db.tsLow and to handle the potential store.ErrVersionPruned error from getSlice. This change is consistent with the removal of tsLow and the new error handling strategy. Ensure that all callers of these functions are updated to handle the new error appropriately.

  • 154-158: The Prune function has been updated to use IncreaseFullHistoryTsLow instead of manually updating tsLow. This aligns with the summary's point about shifting the RocksDB pruning approach. Ensure that this change is tested thoroughly since it affects how data is pruned and could have implications for data integrity.

store/storage/sqlite/db.go (7)
  • 18-26: The keyPruneHeight constant is introduced correctly and the reservedUpsertStmt is updated to handle upserts for reserved keys. This looks good as long as the rest of the codebase is updated to handle the new keyPruneHeight constant where necessary.

  • 69-87: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [45-85]

The Database struct now includes an earliestVersion field, and the New function initializes this field based on the retrieved prune height. This is a good approach to track the oldest available version after pruning. Ensure that the earliestVersion is correctly used throughout the codebase.

  • 113-119: The SetLatestVersion method correctly updates the latest version in the database. Ensure that this method is called appropriately whenever a new version is committed to the database.

  • 131-137: The Get method now includes a check to ensure that the requested version is not earlier than the earliestVersion. This is a good security measure to prevent access to pruned data. However, ensure that the error handling for store.ErrVersionPruned is implemented wherever this method is called.

  • 239-243: The Iterator method correctly checks for invalid start and end ranges and returns an error if the start is after the end. This is a good practice to prevent unexpected behavior during iteration.

  • 251-255: The ReverseIterator method has the same checks as the Iterator method, which is consistent and correct.

  • 288-310: The getPruneHeight function is implemented to retrieve the prune height from the database. It correctly handles the case where no rows are found, returning zero in that case. This is a good approach to initialize the earliestVersion when the database is new or has not been pruned yet.

store/storage/rocksdb/db.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 8fc7e93 and 49c6c6c.
Files selected for processing (4)
  • store/errors.go (2 hunks)
  • store/storage/pebbledb/db.go (11 hunks)
  • store/storage/rocksdb/db.go (5 hunks)
  • store/storage/sqlite/db.go (9 hunks)
Additional comments: 26
store/errors.go (2)
  • 3-7: The import of the "fmt" package is correctly added to support the new error formatting in ErrVersionPruned. Ensure that the "cosmossdk.io/errors" package is the correct one for registering errors in this context.

  • 47-48: The Error() method for ErrVersionPruned is correctly implemented, providing a formatted error message that includes the EarliestVersion field. This is a good use of Go's error handling pattern.

store/storage/rocksdb/db.go (5)
  • 33-39: The comment on tsLow has been updated to reflect its purpose more clearly. This is a good practice as it helps maintainers understand the significance of the variable within the context of the code.

  • 73-79: The NewWithDB function has been updated to initialize tsLow only if tsLowBz is not empty. This is a logical change that prevents potential misuse of an uninitialized variable.

  • 91-100: The getSlice method now includes a check for version < db.tsLow and returns a store.ErrVersionPruned error if the condition is true. This is an important change for correctness, ensuring that attempts to access pruned versions are handled properly.

  • 126-132: The Has method has been updated to handle the case where slice is nil. This is a defensive programming practice that helps prevent potential nil pointer dereferences.

  • 180-184: The Prune method has been updated to increment tsLow by 1 to include the provided version in the pruning process. This is a logical change that aligns with the intended functionality of pruning up to and including the specified version.

store/storage/sqlite/db.go (5)
  • 18-26: The definition of keyPruneHeight constant and the SQL statement for reservedUpsertStmt are correctly implemented. The SQL statement uses an INSERT ... ON CONFLICT ... DO UPDATE pattern which is appropriate for upsert operations in SQLite.

  • 69-87: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [44-85]

The Database struct has been updated to include an earliestVersion field, and the New function has been modified to initialize this field with the value of pruneHeight + 1. This is a logical change assuming that pruneHeight represents the last pruned version, and the earliest available version in the database should be the next one. However, ensure that the getPruneHeight function correctly handles the case where there is no prune height set in a fresh database.

  • 113-119: The SetLatestVersion method correctly updates the latest_height in the database. The use of reservedUpsertStmt ensures that the value is inserted or updated if it already exists.

  • 131-137: The Get method now includes a check against db.earliestVersion to ensure that the requested version has not been pruned. This is a good practice for preventing access to data that should no longer be available. The error returned is specific and informative.

  • 239-245: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [239-255]

The Iterator and ReverseIterator methods are correctly implemented with checks for empty keys and start-after-end errors. These checks are important for ensuring that the iterators behave as expected and do not return invalid ranges of data.

store/storage/pebbledb/db.go (14)
  • 16-26: The constants are well-defined and follow the Go naming conventions. The comments provide useful information about the purpose of each constant.

  • 30-39: The Database struct is well-defined and includes a new field earliestVersion. The comment explains its purpose clearly. However, ensure that all usages of the Database struct throughout the codebase are updated to handle this new field appropriately.

  • 57-69: The error handling in the New function is good, wrapping the underlying error with additional context. However, the pruneHeight + 1 logic assumes that getPruneHeight returns the last pruned height, not the earliest available version. Ensure this assumption is correct and documented.

  • 95-100: The SetLatestVersion method is concise and uses the binary package to encode the version number, which is a standard approach in Go for binary encoding.

  • 117-118: The GetLatestVersion method correctly handles the case where the latest version key might not be found, which is a good practice for handling optional data.

  • 120-126: The setPruneHeight method updates the earliestVersion field and persists the prune height to the database. Ensure that this method is called in the correct context where the pruning operation is guaranteed to have completed successfully.

  • 138-141: The Get method now includes a check against earliestVersion to ensure that the requested version has not been pruned. This is a good practice for data integrity and user feedback.

  • 164-167: The error handling here is consistent and provides clear messaging about the nature of the error, which is good for maintainability and debugging.

  • 196-277: The Prune method has been significantly updated. The comments provide a clear warning about the potential inefficiency of the method. The batch processing logic is a good practice to handle large numbers of deletions without overwhelming the database transaction log. However, ensure that the batch size defined by PruneCommitBatchSize is tuned based on the database performance characteristics and the expected workload.

  • 297-300: The Iterator method creates a new iterator with the correct bounds and passes the earliestVersion to the newPebbleDBIterator function. This ensures that the iterator respects the pruning logic.

  • 321-324: The ReverseIterator method mirrors the Iterator method but is intended for reverse iteration. The same considerations apply here as for the Iterator method.

  • 332-333: The prependStoreKey function is a utility that correctly appends the store key prefix to a given key. This is a common pattern in key-value stores to namespace keys.

  • 335-350: The getPruneHeight function retrieves the prune height from the database and handles the case where the prune height key might not exist, which is good for robustness.

  • 374-376: The getMVCCSlice function correctly increments the version to make the end domain exclusive. Ensure that this logic is well-documented and understood by the callers of this function.

store/errors.go Outdated Show resolved Hide resolved
store/storage/rocksdb/db.go Outdated Show resolved Hide resolved
store/storage/sqlite/db.go Show resolved Hide resolved
store/storage/pebbledb/db.go Show resolved Hide resolved
store/storage/sqlite/db.go Show resolved Hide resolved
@alexanderbez
Copy link
Contributor Author

a bit unrelated to this pr, but for pebbledb and sqllite it could make sense to have a current state table so we can get, set and iterate faster for current state as state grows on disk.

Good thinking, but it wouldn't help. For PebbleDB, you would need a separate DB entirely which complicates the codebase significantly. And basic benchmarks show it's on par with RocksDB (not as fast as expected).

For SQLite, we would need a separate table. But again, complicating the codebase. I'd like to see Osmosis numbers before going such a route :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 49c6c6c and 930dbd4.
Files selected for processing (2)
  • store/errors.go (2 hunks)
  • store/storage/rocksdb/db.go (5 hunks)
Additional comments: 7
store/errors.go (2)
  • 3-7: The import of "cosmossdk.io/errors" should be verified to ensure it is the correct package and that it is being used elsewhere in the code. If this import is not used, it should be removed to keep the code clean.

  • 34-49: The addition of ErrVersionPruned is a good practice for handling specific error cases related to version pruning. It is important to ensure that this error is handled wherever versions are queried in the codebase. The error message is clear and informative, providing the earliest available version when a pruned version is requested.

store/storage/rocksdb/db.go (5)
  • 33-39: The Database struct has been updated to include a tsLow field, which represents the earliest version supported by the database. This is a significant change as it introduces a new constraint on the data retrieval methods to check against this tsLow value to prevent access to pruned data.

  • 73-79: The NewWithDB function has been correctly updated to initialize the tsLow field of the Database struct. This ensures that the tsLow value is set when a new Database instance is created with an existing RocksDB instance and column family handle.

  • 91-100: The getSlice method now includes a check to ensure that the requested version is not less than tsLow. If it is, an ErrVersionPruned error is returned. This is a critical update for maintaining data integrity and ensuring that clients do not attempt to read data that has been pruned.

  • 126-129: The Has method has been updated to use the new getSlice method, which includes the tsLow check. This ensures that the Has method respects the pruning logic and does not return true for keys that belong to pruned versions.

  • 135-138: The Get method has also been updated to use the getSlice method. The error message has been enhanced to provide more context by wrapping the original error with a message indicating the failure to get a RocksDB slice. This improves error handling by providing more detailed information about the nature of the error.

Copy link
Contributor

@cool-develope cool-develope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 930dbd4 and cb6aa04.
Files selected for processing (1)
  • store/storage/pebbledb/db.go (11 hunks)

t2.store_key = state_storage.store_key AND
t2.key = state_storage.key AND
t2.version <= ?
) AND store_key != ?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new table (max_version, key, store_key) could be helpful? this query seems like very heavy

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between cb6aa04 and c44fd82.
Files selected for processing (1)
  • store/storage/pebbledb/db_test.go (1 hunks)

Comment on lines 12 to 26
func TestStorageTestSuite(t *testing.T) {
s := &storage.StorageTestSuite{
NewDB: func(dir string) (store.VersionedDatabase, error) {
return New(dir)
db, err := New(dir)
if err == nil && db != nil {
// we set sync=false just to speed up CI tests
db.SetSync(false)
}

return db, err
},
EmptyBatchSize: 12,
SkipTests: []string{
"TestStorageTestSuite/TestDatabase_Prune",
},
}

suite.Run(t, s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to set the database sync behavior to false for speeding up CI tests is reasonable, but it should be documented in the code as per the discussion in the comments. A comment should be added to explain why db.SetSync(false) is being called, to provide context for future maintainers.

  if err == nil && db != nil {
+   // Set sync to false to speed up CI tests (11.994s vs 1.214s)
    db.SetSync(false)
  }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
func TestStorageTestSuite(t *testing.T) {
s := &storage.StorageTestSuite{
NewDB: func(dir string) (store.VersionedDatabase, error) {
return New(dir)
db, err := New(dir)
if err == nil && db != nil {
// we set sync=false just to speed up CI tests
db.SetSync(false)
}
return db, err
},
EmptyBatchSize: 12,
SkipTests: []string{
"TestStorageTestSuite/TestDatabase_Prune",
},
}
suite.Run(t, s)
func TestStorageTestSuite(t *testing.T) {
s := &storage.StorageTestSuite{
NewDB: func(dir string) (store.VersionedDatabase, error) {
db, err := New(dir)
if err == nil && db != nil {
// Set sync to false to speed up CI tests (11.994s vs 1.214s)
db.SetSync(false)
}
return db, err
},
EmptyBatchSize: 12,
}
suite.Run(t, s)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c44fd82 and 94bc2bb.
Files selected for processing (1)
  • store/storage/pebbledb/db_test.go (1 hunks)
Additional comments: 1
store/storage/pebbledb/db_test.go (1)
  • 12-27: The changes here reflect the addition of a comment explaining the rationale behind setting db.SetSync(false), which is to speed up CI tests. This is a good practice as it provides context for future maintainers of the code. The removal of the SkipTests field is not shown in the hunk, but if it was removed, ensure that it is not used elsewhere in the codebase.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 94bc2bb and d903a1e.
Files selected for processing (1)
  • store/pruning/manager_test.go (4 hunks)
Additional comments: 4
store/pruning/manager_test.go (4)
  • 44-51: The test function TestPruning is being skipped entirely with s.T().SkipNow(). This should be verified to ensure it is intentional, as it means the test will not run and therefore will not validate any pruning logic.

  • 55-70: The loop correctly writes batches and commits them for each version, then applies the changeset and prunes the version. The error handling after each operation is appropriate, ensuring that the test fails fast if an operation does not succeed as expected.

  • 77-83: The test checks that the value for "key" at latestVersion-4 (which should be version 96) is as expected, and that the value at version 50 is nil, which implies it has been pruned. This is a good check to ensure that pruning is working as intended. However, it's important to ensure that the pruning settings (set in lines 49-50) are such that version 50 should indeed be pruned at this point in the test.

  • 87-93: The test checks for the existence of a proof for "key" at latestVersion-4 and the absence of a proof at latestVersion-5. This is consistent with the pruning settings and the expected behavior of the system. However, the test should ensure that the error received when getting the proof for latestVersion-5 is specifically due to the version being pruned (e.g., ErrVersionPruned) and not some other error.

@alexanderbez alexanderbez added this pull request to the merge queue Nov 17, 2023
Merged via the queue into main with commit c207163 Nov 17, 2023
52 of 58 checks passed
@alexanderbez alexanderbez deleted the bez/fix-sqlite-ss-prune branch November 17, 2023 17:50
@alexanderbez alexanderbez mentioned this pull request Nov 17, 2023
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement PebbleDB Pruning and Fixes
3 participants