Skip to content

feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861

Draft
SYaoJun wants to merge 1 commit intoapache:mainfrom
SYaoJun:0214_lru
Draft

feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861
SYaoJun wants to merge 1 commit intoapache:mainfrom
SYaoJun:0214_lru

Conversation

@SYaoJun
Copy link
Contributor

@SYaoJun SYaoJun commented Feb 14, 2026

issue: #860

Reason for this PR

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2026

Codecov Report

❌ Patch coverage is 92.43697% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.75%. Comparing base (d1fe7f9) to head (9316629).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cpp/src/graphar/arrow/chunk_reader.cc 92.20% 6 Missing ⚠️
cpp/src/graphar/lru_cache.h 92.10% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #861      +/-   ##
============================================
+ Coverage     80.60%   80.75%   +0.15%     
  Complexity      615      615              
============================================
  Files            94       95       +1     
  Lines         10707    10799      +92     
  Branches       1055     1060       +5     
============================================
+ Hits           8630     8721      +91     
- Misses         1837     1838       +1     
  Partials        240      240              
Flag Coverage Δ
cpp 71.37% <92.43%> (+0.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SYaoJun SYaoJun marked this pull request as ready for review February 27, 2026 00:27
@SYaoJun
Copy link
Contributor Author

SYaoJun commented Feb 27, 2026

before benchmark release

Run ./graph_info_benchmark
2026-02-26T17:27:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3493.21 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     202838 ns       202816 ns         3188
2026-02-26T17:27:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3490.97 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                    6966 ns         6965 ns       100206
BenchmarkFixture/CreateAdjListArrowChunkReader                                           3578 ns         3577 ns       196996
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     3551 ns         3550 ns       197172
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                              325547 ns       166224 ns         3893
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                      271021 ns       152848 ns         4235
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                228655 ns       136690 ns         5239
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1      357058 ns       189591 ns         3335
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1      361736 ns       201011 ns         3760
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1      308693 ns       179371 ns         4042
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2      152241 ns       133497 ns         5182
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2      125877 ns       108899 ns         6275
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2       97200 ns        82066 ns         8745
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1    1234745 ns       396302 ns         1797
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1     821340 ns       301181 ns         2272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1     438876 ns       191486 ns         3756
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2     805788 ns       775914 ns          890
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2     502274 ns       462362 ns         1506
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2     218477 ns       188513 ns         3677
2026-02-26T17:27:40+00:00
Running ./label_filter_benchmark
Run on (4 X 2820.27 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 2.52, 1.93, 0.91
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            111665 ns       111611 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1237938 ns       658114 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10              92731 ns        92712 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      719147 ns       465075 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            49065 ns        48985 ns           10

after benchmark release

Run ./graph_info_benchmark
2026-02-27T00:34:22+00:00
Running ./graph_info_benchmark
Run on (4 X 3243.65 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     231562 ns       231549 ns         2534
2026-02-27T00:34:23+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3241.99 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                   13837 ns        13836 ns        50724
BenchmarkFixture/CreateAdjListArrowChunkReader                                           7077 ns         7076 ns        98705
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     7103 ns         7103 ns        98224
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                                 256 ns          255 ns      2743767
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                         448 ns          448 ns      1645296
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                   248 ns          248 ns      2853157
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1         900 ns          899 ns       788047
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1         617 ns          617 ns      1135186
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1         434 ns          434 ns      1567110
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2         868 ns          868 ns       789657
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2         600 ns          600 ns      1149521
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2         418 ns          418 ns      1691556
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1        892 ns          890 ns       802783
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1        600 ns          600 ns      1160415
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1        430 ns          430 ns      1683496
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2        819 ns          819 ns       845382
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2        588 ns          588 ns      1166540
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2        420 ns          420 ns      1695935
2026-02-27T00:34:39+00:00
Running ./label_filter_benchmark
Run on (4 X 3263.15 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.89, 2.21, 1.05
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            148347 ns       148227 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1292426 ns       766637 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10             109273 ns       109286 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      818002 ns       539672 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            65665 ns        65433 ns           10

summary

Performance Improvements

Benchmark Before (ns) After (ns) Improvement
Vertex_second_All_V2 775,914 819 947.39x
Vertex_second_Two_V2 462,362 588 786.33x
AdjListPropertyArrowChunkReaderReadChunk 166,224 255 651.86x
AdjListOffsetArrowChunkReaderReadChunk 136,690 248 551.17x
Vertex_second_Two_V1 301,181 600 501.97x
Vertex_second_One_V2 188,513 420 448.84x
Vertex_second_One_V1 191,486 430 445.32x
Vertex_second_All_V1 396,302 890 445.28x
Vertex_first_One_V1 179,371 434 413.30x
AdjListArrowChunkReaderReadChunk 152,848 448 341.18x
Vertex_first_Two_V1 201,011 617 325.79x
Vertex_first_All_V1 189,591 899 210.89x
Vertex_first_One_V2 82,066 418 196.33x
Vertex_first_Two_V2 108,899 600 181.50x
Vertex_first_All_V2 133,497 868 153.80x

Minor Performance Changes

Benchmark Before (ns) After (ns) Ratio
InitialGraphInfo 202,816 231,549 0.88x
MultiLabelFilterByAcero 465,075 539,672 0.86x
SingleLabelFilterByAcero 658,114 766,637 0.86x
MultiLabelFilter 92,712 109,286 0.85x

Performance Regressions

Benchmark Before (ns) After (ns) Regression
SingleLabelFilter 111,611 148,227 0.75x
LabelFilterFromSet 48,985 65,433 0.75x
CreateAdjListArrowChunkReader 3,577 7,076 0.51x
CreateVertexPropertyArrowChunkReader 6,965 13,836 0.50x
CreateAdjListOffsetArrowChunkReader 3,550 7,103 0.50x

@SYaoJun
Copy link
Contributor Author

SYaoJun commented Feb 27, 2026

@Sober7135 @yangxk1 I have implemented a basic version of the LRU Cache and summarized the benchmark comparison results in the comments above. Do you have any suggestions or feedback on the code implementation?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory LRU cache layer to C++ Arrow chunk readers to reduce redundant Parquet reads during seek/backtracking workloads (issue #860), plus introduces a generic LRUCache utility and unit tests.

Changes:

  • Introduce graphar::LRUCache (and PairHash) with Catch2 unit tests.
  • Wire chunk-level caching into all four Arrow chunk reader implementations to reuse previously loaded arrow::Tables.
  • Register the new cache unit test in the C++ CMake test suite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/src/graphar/lru_cache.h New generic LRU cache + pair hash utility used by chunk readers.
cpp/src/graphar/arrow/chunk_reader.h Adds per-reader cache members for chunk tables.
cpp/src/graphar/arrow/chunk_reader.cc Uses the cache on seek/next_chunk and populates it after reading tables.
cpp/test/test_lru_cache.cc New unit tests covering LRU cache behavior (eviction, update, edge cases).
cpp/CMakeLists.txt Adds test_lru_cache to the test suite.
Comments suppressed due to low confidence (2)

cpp/src/graphar/arrow/chunk_reader.h:394

  • New chunk_cache_ state is introduced, but AdjListArrowChunkReader defines a copy assignment operator. In chunk_reader.cc, operator= currently does not clear or copy chunk_cache_, so an instance can retain stale cached tables after assignment (potentially returning data from the previous reader configuration). Please update the assignment operator to reset the cache (and consider whether copy-ctor/assignment should copy or clear cached entries consistently).
  std::shared_ptr<arrow::Table> chunk_table_;
  LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
      chunk_cache_{4};
  IdType vertex_chunk_num_, chunk_num_;
  std::string base_dir_;
  std::shared_ptr<FileSystem> fs_;
};

cpp/src/graphar/arrow/chunk_reader.h:647

  • New chunk_cache_ state is introduced, but AdjListPropertyArrowChunkReader defines a copy assignment operator. In chunk_reader.cc, operator= currently does not clear or copy chunk_cache_, so an instance can retain stale cached tables after assignment. Please update the assignment operator to reset the cache (and ensure copy semantics are consistent across ctor/assignment).
  std::shared_ptr<arrow::Schema> schema_;
  std::shared_ptr<arrow::Table> chunk_table_;
  LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
      chunk_cache_{4};
  util::FilterOptions filter_options_;
  IdType vertex_chunk_num_, chunk_num_;
  std::string base_dir_;
  std::shared_ptr<FileSystem> fs_;
};

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +80 to +82
auto h1 = std::hash<T1>{}(p.first);
auto h2 = std::hash<T2>{}(p.second);
return h1 ^ (h2 << 32);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PairHash combines hashes via h1 ^ (h2 << 32). Shifting by 32 is undefined behavior when size_t is 32-bit, and even on 64-bit this is a weak combiner (high collision risk). Please switch to a portable hash-combine that doesn't assume size_t width (e.g., h1 ^ (h2 + 0x9e3779b97f4a7c15ULL + (h1<<6) + (h1>>2)) or an equivalent width-agnostic approach).

Suggested change
auto h1 = std::hash<T1>{}(p.first);
auto h2 = std::hash<T2>{}(p.second);
return h1 ^ (h2 << 32);
size_t h1 = std::hash<T1>{}(p.first);
size_t h2 = std::hash<T2>{}(p.second);
// Width-agnostic hash combine (inspired by boost::hash_combine)
constexpr size_t kMul =
static_cast<size_t>(0x9e3779b97f4a7c15ULL);
h1 ^= h2 + kMul + (h1 << 6) + (h1 >> 2);
return h1;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@yangxk1 yangxk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to a benchmark


void VertexPropertyArrowChunkReader::Filter(util::Filter filter) {
filter_options_.filter = filter;
chunk_table_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be initialized in this xxxFilter function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, After filter the cache is outdated.


void VertexPropertyArrowChunkReader::Select(util::ColumnNames column_names) {
filter_options_.columns = column_names;
chunk_table_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be initialized in this xxxSelect function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

IdType vertex_num_;
std::shared_ptr<arrow::Schema> schema_;
std::shared_ptr<arrow::Table> chunk_table_;
LRUCache<IdType, std::shared_ptr<arrow::Table>> chunk_cache_{4};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_cache size can judge based on memory or let the user control it will be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_cache size can judge based on memory or let the user control it will be better?

Actually, I don't know which place is suitable for keep this parameters(size). Do you have any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in the options and having a default value is a good temporary solution. Maybe we can open a new issue to track it until we can come up with a solution that balances memory, user experience, and efficiency.

@SYaoJun SYaoJun force-pushed the 0214_lru branch 3 times, most recently from f719ebe to 306679d Compare March 2, 2026 13:16
@SYaoJun
Copy link
Contributor Author

SYaoJun commented Mar 2, 2026

Added LRU Cache

benchmark

Summary

After adding the cache, the initialization and creation times have increased. Since filtering and selection operations affect the cache, I currently invalidate all caches aggressively. This has caused performance regressions in filter operations.
I believe this issue requires a more careful design—there is still significant room for substantial improvement.

before

Run ./graph_info_benchmark
2026-02-26T17:27:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3493.21 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     202838 ns       202816 ns         3188
2026-02-26T17:27:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3490.97 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                    6966 ns         6965 ns       100206
BenchmarkFixture/CreateAdjListArrowChunkReader                                           3578 ns         3577 ns       196996
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     3551 ns         3550 ns       197172
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                              325547 ns       166224 ns         3893
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                      271021 ns       152848 ns         4235
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                228655 ns       136690 ns         5239
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1      357058 ns       189591 ns         3335
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1      361736 ns       201011 ns         3760
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1      308693 ns       179371 ns         4042
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2      152241 ns       133497 ns         5182
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2      125877 ns       108899 ns         6275
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2       97200 ns        82066 ns         8745
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1    1234745 ns       396302 ns         1797
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1     821340 ns       301181 ns         2272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1     438876 ns       191486 ns         3756
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2     805788 ns       775914 ns          890
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2     502274 ns       462362 ns         1506
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2     218477 ns       188513 ns         3677
2026-02-26T17:27:40+00:00
Running ./label_filter_benchmark
Run on (4 X 2820.27 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 2.52, 1.93, 0.91
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            111665 ns       111611 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1237938 ns       658114 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10              92731 ns        92712 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      719147 ns       465075 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            49065 ns        48985 ns           10

after

Run ./graph_info_benchmark
2026-03-03T00:29:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3249.55 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 5.10, 3.09, 1.38
Run ./graph_info_benchmark
2026-03-03T00:29:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3249.55 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 5.10, 3.09, 1.38
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     224050 ns       224045 ns         3099
2026-03-03T00:29:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3223.87 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 5.10, 3.09, 1.38
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     224050 ns       224045 ns         3099
2026-03-03T00:29:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3223.87 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 5.10, 3.09, 1.38
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                   13676 ns        13676 ns        51040
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                   13676 ns        13676 ns        51040
BenchmarkFixture/CreateAdjListArrowChunkReader                                           7093 ns         7093 ns        98756
BenchmarkFixture/CreateAdjListArrowChunkReader                                           7093 ns         7093 ns        98756
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     7097 ns         7095 ns        98065
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     7097 ns         7095 ns        98065
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                                 275 ns          274 ns      2608699
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                                 275 ns          274 ns      2608699
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                         431 ns          431 ns      1608175
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                         431 ns          431 ns      1608175
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                   259 ns          259 ns      2745289
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                   259 ns          259 ns      2745289
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1         867 ns          867 ns       802223
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1         867 ns          867 ns       802223
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1         607 ns          607 ns      1149839
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1         607 ns          607 ns      1149839
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1         444 ns          443 ns      1561272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1         444 ns          443 ns      1561272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2         866 ns          866 ns       782204
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2         866 ns          866 ns       782204
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2         699 ns          699 ns      1142953
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2         699 ns          699 ns      1142953
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2         426 ns          426 ns      1647265
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2         426 ns          426 ns      1647265
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1        899 ns          897 ns       766886
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1        899 ns          897 ns       766886
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1        612 ns          611 ns      1117359
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1        612 ns          611 ns      1117359
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1        436 ns          435 ns      1636031
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1        436 ns          435 ns      1636031
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2        855 ns          855 ns       815147
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2        855 ns          855 ns       815147
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2        603 ns          603 ns      1157214
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2        603 ns          603 ns      1157214
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2        417 ns          417 ns      1660660
2026-03-03T00:29:29+00:00
Running ./label_filter_benchmark
Run on (4 X 3237.75 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 4.26, 3.00, 1.38
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            143143 ns       143156 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1237766 ns       730827 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10             111126 ns       111068 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      809287 ns       528257 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            66292 ns        66094 ns           10
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2        417 ns          417 ns      1660660
2026-03-03T00:29:29+00:00
Running ./label_filter_benchmark
Run on (4 X 3237.75 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 4.26, 3.00, 1.38
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            143143 ns       143156 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1237766 ns       730827 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10             111126 ns       111068 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      809287 ns       528257 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            66292 ns        66094 ns           10

Summary

Performance Improvements

Benchmark Before (ns) After (ns) Improvement
AdjListPropertyArrowChunkReaderReadChunk 166224 274 606.66x
AdjListArrowChunkReaderReadChunk 152848 431 354.64x
AdjListOffsetArrowChunkReaderReadChunk 136690 259 527.76x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1 189591 867 218.67x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1 201011 607 331.15x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1 179371 443 404.90x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2 133497 866 154.15x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2 108899 699 155.79x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2 82066 426 192.64x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1 396302 897 441.81x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1 301181 611 493.00x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1 191486 435 440.20x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2 775914 855 907.50x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2 462362 603 766.77x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2 188513 417 452.07x

Minor Performance Changes

Benchmark Before (ns) After (ns) Ratio
CreateVertexPropertyArrowChunkReader 6965 13676 0.51x
CreateAdjListArrowChunkReader 3577 7093 0.50x
CreateAdjListOffsetArrowChunkReader 3550 7095 0.50x

Performance Regressions

Benchmark Before (ns) After (ns) Regression
InitialGraphInfo 202816 224045 0.91x
SingleLabelFilter 111611 143156 0.78x
SingleLabelFilterbyAcero 658114 730827 0.90x
MultiLabelFilter 92712 111068 0.83x
MultiLabelFilterbyAcero 465075 528257 0.88x
LabelFilterFromSet 48985 66094 0.74x

@SYaoJun
Copy link
Contributor Author

SYaoJun commented Mar 3, 2026

I have updated the benchmark above. The filter performance has degraded slightly, but I think it is acceptable for the first version. I will continue optimizing it over the next few days.

@SYaoJun SYaoJun requested a review from yangxk1 March 3, 2026 00:45
@yangxk1
Copy link
Contributor

yangxk1 commented Mar 4, 2026

I don't think the existing benchmarks can be used to reflect performance changes. For example, AdjListPropertyArrowChunkReaderReadChunk creates a reader and uses this reader to read chunk0 and chunk1 multiple times, which will cause all data to be hit in the cache. A better way is to create a separate benchmark file and test the chunksize as 0/1/n respectively.

@SYaoJun
Copy link
Contributor Author

SYaoJun commented Mar 4, 2026

I don't think the existing benchmarks can be used to reflect performance changes. For example, AdjListPropertyArrowChunkReaderReadChunk creates a reader and uses this reader to read chunk0 and chunk1 multiple times, which will cause all data to be hit in the cache. A better way is to create a separate benchmark file and test the chunksize as 0/1/n respectively.

Ok, I also want to add intensive benchmarks. LRU cache affect the whole projects. No rush. After added benchmarks, then go next

@SYaoJun SYaoJun marked this pull request as draft March 4, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants