Background
Currently, all Arrow chunk readers (
VertexPropertyArrowChunkReader, AdjListArrowChunkReader, AdjListOffsetArrowChunkReader, AdjListPropertyArrowChunkReader) discard the loaded chunk_table_ every time the chunk position changes via seek(), next_chunk(), or seek_chunk_index(). This means that if a user seeks back to a previously loaded chunk, the entire Parquet file must be re-opened, metadata parsed, and data decoded again — even though the data hasn't changed.
This is particularly costly in graph traversal workloads (BFS, PageRank, label filtering) where vertex/edge access patterns exhibit strong locality, causing the same chunks to be read repeatedly.
Proposal
Introduce a genericLruCache<Key, Value>and integrate it into all four chunk reader classes. When a chunk is loaded from disk, it is stored in the cache. On subsequent seeks to the same chunk, the cached arrow::Table is returned directly, avoiding file I/O entirely.
Component(s)
C++