-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39704: [C++][Parquet] Benchmark levels decoding #39705
Conversation
|
@pitrou @emkornfield @wgtmac Would you mind take a look? Also cc @Hattonuri |
Benchmark on my MacOS with Release (-O2)
|
This benchmark shows that, when not highly repeated, the RLE without bitpacking is slow 😅 After changing RLE to BIT_PACKED, the speed gets a bit faster when repeat is not high:
We should mentioned that, our native unpack to int16 is slow =_=, this making decoding a bit slow. |
Is this use case relevant here? #34510 Reading a non nullable fixed size list is missing the fast path, it’d nice to see it in the benchmark (even if not improving yet). With all the AI nowadays I assume tensor storage will be more and more common. |
Yeah I think it's related, I think I can optimize unpack later, but maybe I need some help in optimizing RLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea @mapleFU . Please see my comments below.
@emkornfield @pitrou Updated, so sorry for the delaying |
Result in my MacOS with Release(-O2):
|
b402c93
to
fc0c1a5
Compare
Ping @pitrou @emkornfield for help |
Strange phenomenon: we get results like |
Oh, it seems Google benchmark has a weird behavior here. Unrelated to this PR though. |
Posted google/benchmark#1749 for the Google benchmark oddity. |
FTR, benchmark numbers here:
|
@pitrou I think the benchmark result shows "batch_size" should be take into consideration, for example, when the batchsize grows, BITPACK doesn't get improved, however, RLE is well optimized |
Mybad, change vector for output to |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 0c88d13. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change This patch add the level-decoding benchmark. It test: 1. Different max-level (for flat type, maximum level would be 1, for nested type, it would grows) 2. With different repeat ( repeated null / non-null is different from non-repeated data) 3. With different read-batch size. This part of logic is a bit tricky in original code ### What changes are included in this PR? Add Level decoding benchmark ### Are these changes tested? No need ### Are there any user-facing changes? no * Closes: apache#39704 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change This patch add the level-decoding benchmark. It test: 1. Different max-level (for flat type, maximum level would be 1, for nested type, it would grows) 2. With different repeat ( repeated null / non-null is different from non-repeated data) 3. With different read-batch size. This part of logic is a bit tricky in original code ### What changes are included in this PR? Add Level decoding benchmark ### Are these changes tested? No need ### Are there any user-facing changes? no * Closes: apache#39704 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change This patch add the level-decoding benchmark. It test: 1. Different max-level (for flat type, maximum level would be 1, for nested type, it would grows) 2. With different repeat ( repeated null / non-null is different from non-repeated data) 3. With different read-batch size. This part of logic is a bit tricky in original code ### What changes are included in this PR? Add Level decoding benchmark ### Are these changes tested? No need ### Are there any user-facing changes? no * Closes: apache#39704 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
This patch add the level-decoding benchmark. It test:
What changes are included in this PR?
Add Level decoding benchmark
Are these changes tested?
No need
Are there any user-facing changes?
no