Skip to content

Commit

Permalink
[opt](parquet) change parquet init footer read size to 48KB (#46904)
Browse files Browse the repository at this point in the history
### What problem does this PR solve?

Change the initial footer read size from 128KB to 48KB, to slightly
reduce the read size.
This is same as presto/trino, because typically, a 1GB parquet file
usually has footer with size 30~40KB.

And usercase shows when there are 30 thousands parquet file, the parse
footer time can reduce from:

```
ParseFooterTime:  avg  2s28ms,  max  3s707ms,  min  905.866ms
```
to
```
ParseFooterTime:  avg  886.364ms,  max  1s734ms,  min  391.846ms
```
  • Loading branch information
morningman authored Jan 16, 2025
1 parent a956a52 commit c16567e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion be/src/vec/exec/format/parquet/parquet_thrift_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ namespace doris::vectorized {

constexpr uint8_t PARQUET_VERSION_NUMBER[4] = {'P', 'A', 'R', '1'};
constexpr uint32_t PARQUET_FOOTER_SIZE = 8;
constexpr size_t INIT_META_SIZE = 128 * 1024; // 128k
constexpr size_t INIT_META_SIZE = 48 * 1024; // 48k

static Status parse_thrift_footer(io::FileReaderSPtr file, FileMetaData** file_metadata,
size_t* meta_size, io::IOContext* io_ctx) {
Expand Down

0 comments on commit c16567e

Please sign in to comment.