Skip to content

Conversation

@blaginin
Copy link
Member

No description provided.

@blaginin blaginin self-assigned this Sep 24, 2025
@blaginin blaginin added the changelog/fix A bug fix label Sep 24, 2025
Comment on lines 292 to 313
metadata: EmptyMetadata
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
buffer (align=1): 28 B (0.63%)
buffer (align=1): 29 B (0.65%)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onursatici this shouldn't be the case, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you worked on something similar in #4054

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.56%. Comparing base (c7045ec) to head (c58ac57).
⚠️ Report is 2 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blaginin blaginin marked this pull request as ready for review September 25, 2025 10:28
@blaginin blaginin requested a review from onursatici September 25, 2025 10:37
@blaginin blaginin enabled auto-merge (squash) September 25, 2025 10:45
// TODO(os): add invert_mask opt and check if if_false has a kernel like:
// kernel.invoke(Args(if_false, if_true, mask, invert_mask = true))

if !if_true.is_canonical() || !if_false.is_canonical() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the thing that is actually changing here? Doesn't this indicate that our default impl is bad?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes us use impl ZipKernel for VarBinViewVTable. Default impl is indeed doesn't handle VarBinViews properly, that's why Onur did #4054

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the default behavior different, but i think needs to still be here, because it follows the pattern from other functions (1, 2, 3)

Copy link
Contributor

@onursatici onursatici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, but we should just make zip_impl, the default implementation when no kernels are found, to be the same as the varbinview kernel. We can then remove the varbinview kernel as well, because it will be the default for two already canonicalised arrays anyway.

we might need to branch in creating the builder to create the deduplicating builder for varbinviews, and the regular canonical builders for others

Comment on lines 311 to 322
#[test]
fn test_varbinview_zip() {
let if_true = {
let mut builder = VarBinViewBuilder::new(
DType::Utf8(Nullability::NonNullable),
10,
Default::default(),
BufferGrowthStrategy::fixed(64 * 1024),
);
for _ in 0..100 {
builder.append_value("Hello");
builder.append_value("Hello this is a long string that won't be inlined.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just moved this test


impl Default for CompletedBuffers {
fn default() -> Self {
Self::Default(Vec::new())
Copy link
Member Author

@blaginin blaginin Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onursatici I didn't change builder_with_capacity because IMO it would be non-intuitive if it returned something rather than the default builder. Changed it here instead - feel like we can afford Deduplicated to be the default right?


pub enum CompletedBuffers {
Default(Vec<ByteBuffer>),
Plain(Vec<ByteBuffer>),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed otherwise it's strange that it is not chosen by default

@blaginin blaginin requested a review from onursatici September 25, 2025 12:11
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 25, 2025

CodSpeed Performance Report

Merging #4747 will not alter performance

Comparing db/zip-fragmentation (c58ac57) with develop (c7045ec)

Summary

✅ 1156 untouched

Copy link
Contributor

@onursatici onursatici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to turn on deduplication by default but we use it in some many places to make the switch easy: #4695
I think the default should remain the old default for now because of that

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Sep 29, 2025

Deploying vortex-bench with  Cloudflare Pages  Cloudflare Pages

Latest commit: 172b6a6
Status: ✅  Deploy successful!
Preview URL: https://638614e4.vortex-93b.pages.dev
Branch Preview URL: https://db-zip-fragmentation.vortex-93b.pages.dev

View logs

@blaginin
Copy link
Member Author

I think the default should remain the old default for now because of that

Sure, in that case, let's handle this case specifically for VarBinViewVTable (like we do with other compute functions)? Later, once #4695 is merge, we'll handle it properly for all arrays

@blaginin blaginin force-pushed the db/zip-fragmentation branch from f166845 to 888f13e Compare September 29, 2025 14:13
Signed-off-by: blaginin <dima@spiraldb.com>
@blaginin blaginin force-pushed the db/zip-fragmentation branch from 172b6a6 to c58ac57 Compare September 29, 2025 14:29
@blaginin blaginin requested a review from AdamGS September 29, 2025 15:12
Copy link
Contributor

@AdamGS AdamGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, IDK if anyone else wants to weigh in.

@blaginin blaginin merged commit 96d3d95 into develop Sep 29, 2025
39 checks passed
@blaginin blaginin deleted the db/zip-fragmentation branch September 29, 2025 16:22
@blaginin
Copy link
Member Author

Will discuss with @onursatici once he is back and do changes of top as needed :)

blaginin added a commit that referenced this pull request Oct 1, 2025
follow up on #4747

---------

Signed-off-by: blaginin <dima@spiraldb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants