Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Page index for parquet reader #30435

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

zombee0
Copy link
Contributor

@zombee0 zombee0 commented Sep 5, 2023

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.1
    • 3.0
    • 2.5
    • 2.4

@mergify mergify bot assigned zombee0 Sep 5, 2023
@zombee0 zombee0 force-pushed the page_index branch 5 times, most recently from 5b19278 to 32a27b8 Compare September 8, 2023 11:45
@zombee0 zombee0 force-pushed the page_index branch 2 times, most recently from 5447bc6 to c515c6f Compare September 14, 2023 10:37
@github-actions github-actions bot deleted a comment from wanpengfei-git Nov 23, 2023
@github-actions github-actions bot deleted a comment from wanpengfei-git Nov 23, 2023
@zombee0 zombee0 force-pushed the page_index branch 2 times, most recently from fe284e4 to 4291b62 Compare November 23, 2023 08:51
@zombee0 zombee0 force-pushed the page_index branch 2 times, most recently from 800e5d7 to e9db536 Compare December 5, 2023 12:56
dirtysalt
dirtysalt previously approved these changes Dec 25, 2023
dirtysalt
dirtysalt previously approved these changes Dec 28, 2023
Signed-off-by: zombee0 <ewang2027@gmail.com>
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 295 / 355 (83.10%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/formats/parquet/column_reader.h 0 4 00.00% [138, 140, 142, 143]
🔵 src/formats/parquet/stored_column_reader_with_index.h 9 16 56.25% [32, 33, 36, 37, 47, 49, 51]
🔵 src/formats/parquet/stored_column_reader.h 9 14 64.29% [66, 68, 69, 72, 87]
🔵 src/formats/parquet/page_index_reader.cpp 107 134 79.85% [48, 51, 103, 110, 183, 184, 185, 186, 188, 195, 196, 197, 209, 212, 213, 214, 215, 216, 219, 220, 221, 222, 224, 226, 228, 229, 233]
🔵 src/formats/parquet/stored_column_reader.cpp 25 30 83.33% [590, 610, 657, 683, 693]
🔵 src/formats/parquet/column_reader.cpp 83 95 87.37% [368, 372, 400, 822, 823, 824, 826, 827, 1128, 1129, 1130, 1131]
🔵 src/formats/parquet/page_index_reader.h 3 3 100.00% []
🔵 src/formats/parquet/column_chunk_reader.h 8 8 100.00% []
🔵 src/formats/parquet/column_chunk_reader.cpp 1 1 100.00% []
🔵 src/formats/parquet/file_reader.cpp 10 10 100.00% []
🔵 src/formats/parquet/stored_column_reader_with_index.cpp 22 22 100.00% []
🔵 src/formats/parquet/group_reader.cpp 13 13 100.00% []
🔵 src/formats/parquet/page_reader.cpp 2 2 100.00% []
🔵 src/formats/parquet/page_reader.h 3 3 100.00% []

std::vector<io::SharedBufferedInputStream::IORange> ranges;
int64_t end_offset = 0;
r->collect_io_ranges(&ranges, &end_offset);
r->collect_io_ranges(&ranges, &end_offset, ColumnIOType::PAGES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if config::parquet_coalesce_read_enable?

_row_group_metadata, _param.min_max_conjunct_ctxs);
ASSIGN_OR_RETURN(bool flag, page_index_reader->generate_read_range(_range));
if (flag && !_is_group_filtered) {
page_index_reader->select_column_offset_index();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add some comment

@dirtysalt dirtysalt merged commit c613a91 into StarRocks:main Jan 3, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants