-
Notifications
You must be signed in to change notification settings - Fork 130
Zip Fragmentation on take
#4747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vortex-array/src/compute/zip.rs
Outdated
| metadata: EmptyMetadata | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) | ||
| buffer (align=1): 28 B (0.63%) | ||
| buffer (align=1): 29 B (0.65%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onursatici this shouldn't be the case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you worked on something similar in #4054
Codecov Report✅ All modified and coverable lines are covered by tests. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // TODO(os): add invert_mask opt and check if if_false has a kernel like: | ||
| // kernel.invoke(Args(if_false, if_true, mask, invert_mask = true)) | ||
|
|
||
| if !if_true.is_canonical() || !if_false.is_canonical() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the thing that is actually changing here? Doesn't this indicate that our default impl is bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes us use impl ZipKernel for VarBinViewVTable. Default impl is indeed doesn't handle VarBinViews properly, that's why Onur did #4054
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, but we should just make zip_impl, the default implementation when no kernels are found, to be the same as the varbinview kernel. We can then remove the varbinview kernel as well, because it will be the default for two already canonicalised arrays anyway.
we might need to branch in creating the builder to create the deduplicating builder for varbinviews, and the regular canonical builders for others
vortex-array/src/compute/zip.rs
Outdated
| #[test] | ||
| fn test_varbinview_zip() { | ||
| let if_true = { | ||
| let mut builder = VarBinViewBuilder::new( | ||
| DType::Utf8(Nullability::NonNullable), | ||
| 10, | ||
| Default::default(), | ||
| BufferGrowthStrategy::fixed(64 * 1024), | ||
| ); | ||
| for _ in 0..100 { | ||
| builder.append_value("Hello"); | ||
| builder.append_value("Hello this is a long string that won't be inlined."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just moved this test
|
|
||
| impl Default for CompletedBuffers { | ||
| fn default() -> Self { | ||
| Self::Default(Vec::new()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onursatici I didn't change builder_with_capacity because IMO it would be non-intuitive if it returned something rather than the default builder. Changed it here instead - feel like we can afford Deduplicated to be the default right?
|
|
||
| pub enum CompletedBuffers { | ||
| Default(Vec<ByteBuffer>), | ||
| Plain(Vec<ByteBuffer>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed otherwise it's strange that it is not chosen by default
CodSpeed Performance ReportMerging #4747 will not alter performanceComparing Summary
|
onursatici
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to turn on deduplication by default but we use it in some many places to make the switch easy: #4695
I think the default should remain the old default for now because of that
Deploying vortex-bench with
|
| Latest commit: |
172b6a6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://638614e4.vortex-93b.pages.dev |
| Branch Preview URL: | https://db-zip-fragmentation.vortex-93b.pages.dev |
Sure, in that case, let's handle this case specifically for |
f166845 to
888f13e
Compare
Signed-off-by: blaginin <dima@spiraldb.com>
172b6a6 to
c58ac57
Compare
AdamGS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, IDK if anyone else wants to weigh in.
|
Will discuss with @onursatici once he is back and do changes of top as needed :) |
follow up on #4747 --------- Signed-off-by: blaginin <dima@spiraldb.com>
No description provided.