Skip to content

Conversation

@masih
Copy link
Collaborator

@masih masih commented Aug 28, 2025

Describe your changes and provide context

This PR introduces a new parquet storage backend for the ss package. The primary goal is to significantly optimize MVCC-based queries, particularly debug_trace queries from RPC nodes, by leveraging Parquet's columnar format and HDFS-style partitioning.

Key features include:

  • HDFS-style Partitioning: Data is partitioned by version (block height) and account owner (version={height}/owner={account}/data_{version}.parquet), enabling efficient data pruning and targeted access.
  • Optimized Query Methods: Dedicated methods like OptimizedQuery, GetAccountHistory, GetVersionSnapshot, and BulkQuery are implemented to take advantage of partitioning, columnar storage, predicate pushdown, and parallel processing for faster data retrieval.
  • Full StateStore Interface Implementation: The backend fully implements the types.StateStore interface, including Get, Has, Iterator, ReverseIterator, ApplyChangeset, and Import/RawImport with write buffering and MVCC semantics.
  • Performance Enhancements: Incorporates ZSTD compression, dictionary encoding for repeated values, and configurable parallel query execution.

This new backend aims to make historical data queries, especially for debugging purposes, much faster and more resource-efficient.

Testing performed to validate your change

  • Unit Tests: Specific tests (TestOptimizedQuery, TestGetAccountHistory, TestBulkQuery) were added to validate the functionality and optimizations of the new query methods.
  • Example Test: An Example_debugTraceQuery demonstrates the usage and performance benefits of the optimized queries.
  • Integration Tests: The sstest.StorageTestSuite is used to run a comprehensive suite of tests against the parquet backend, ensuring full compliance with the types.StateStore interface and expected behavior.
  • Compilation Tests: Basic compilation tests were run during development to resolve any integration issues with the parquet-go library and existing codebase.

Open in Cursor Open in Web

@cursor
Copy link

cursor bot commented Aug 28, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants