vortex-btrblocks: write listviews when elements expand > 50%#6393
vortex-btrblocks: write listviews when elements expand > 50%#6393robert3005 merged 1 commit intodevelopfrom
Conversation
|
I haven't given much though to the heuristic and just picked 50% so open to discussion here. We could also introduce this slowly via a write option maybe? The motivation for this is that we're starting to do the work of moving our lists to ListViews and would like to keep the "dict-like encoding" + zero copy back to list view. |
36651d7 to
b44dd33
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
b44dd33 to
25cd045
Compare
Deploying vortex-bench with
|
| Latest commit: |
dd0c393
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://ef08e8ed.vortex-93b.pages.dev |
| Branch Preview URL: | https://asubiotto-listview.vortex-93b.pages.dev |
|
lets runn benchmarks and we can merge if its all good |
connortsui20
left a comment
There was a problem hiding this comment.
Generally looks good to me! Left a few nits
Benchmarks: Random AccessSummary
Detailed Results Table
|
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
|
Is the TPC-H regression expected? It seems like quite a lot of queries slow down on the file-compressed benchmark. To be honest, I don't really believe it at all |
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
|
No rush to merge this (I would actually probably prefer to not have this in the release in case there are unexpected regressions). I can look into the benchmark issues. |
|
@asubiotto I think the benchmark issues were just bad runs, so no need to worry |
dd0c393 to
f717fb9
Compare
|
Updated and addressed the latest comments. Rerunning the benchmarks |
f717fb9 to
91292c0
Compare
91292c0 to
c8034df
Compare
Benchmarks: CompressionSummary
Detailed Results Table
|
|
Not sure what happened to the bench comments |
|
the comments get updated, last edit seems to be from 1h ago which aligns with the time when labels were added last? |
|
Ah, you're right. For some reason I assumed they were new comments every time. |
Previously, listview arrays were always written as lists. This involves materializing the views (i.e. expanding the elements and reorganizing the offsets). This commit chooses to write listviews on compression if the elements would expand > 50% when converting to lists to save on CPU cost and storage space. ZCTL listviews and listviews with low duplication are still encoded as lists, which saves writing the sizes. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
c8034df to
fce02b8
Compare
Previously, listview arrays were always written as lists. This involves materializing the views (i.e. expanding the elements and reorganizing the offsets).
This commit chooses to write listviews on compression if the elements would expand > 50% when converting to lists to save on CPU cost and storage space. ZCTL listviews and listviews with low duplication are still encoded as lists, which saves writing the sizes.