Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Nov 5, 2025

No description provided.

@a10y a10y requested review from connortsui20 and gatesn and removed request for connortsui20 November 5, 2025 21:01
@a10y a10y force-pushed the aduffy/bind-vbview branch from c7d7a4c to 2cfdd8c Compare November 5, 2025 21:03
@a10y a10y added the changelog/feature A new feature label Nov 5, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 5, 2025

CodSpeed Performance Report

Merging #5212 will improve performances by ×3.2

Comparing aduffy/bind-vbview (a817fdb) with develop (28f5b3d)1

Summary

⚡ 36 improvements
✅ 1282 untouched
🆕 7 new
⏩ 155 skipped2

Benchmarks breakdown

Benchmark BASE HEAD Change
decompress_alp[f32, (1000, 0.0, 0.25)] 12.1 µs 8.6 µs +40.98%
decompress_alp[f32, (1000, 0.0, 0.95)] 12.1 µs 8.6 µs +40.5%
decompress_alp[f32, (1000, 0.0, 1.0)] 11.5 µs 8.5 µs +34.19%
decompress_alp[f32, (1000, 0.01, 0.25)] 26.4 µs 11.8 µs ×2.2
decompress_alp[f32, (1000, 0.01, 0.95)] 26 µs 11.5 µs ×2.3
decompress_alp[f32, (1000, 0.01, 1.0)] 15.1 µs 11.3 µs +34.04%
decompress_alp[f32, (1000, 0.1, 0.25)] 26.2 µs 11.6 µs ×2.2
decompress_alp[f32, (1000, 0.1, 0.95)] 28.2 µs 13.2 µs ×2.1
decompress_alp[f32, (1000, 0.1, 1.0)] 16.2 µs 12.4 µs +29.9%
decompress_alp[f32, (10000, 0.0, 0.25)] 81.2 µs 27.3 µs ×3
decompress_alp[f32, (10000, 0.0, 0.95)] 81.1 µs 27.3 µs ×3
decompress_alp[f32, (10000, 0.0, 1.0)] 81 µs 27.2 µs ×3
decompress_alp[f32, (10000, 0.01, 0.25)] 97.2 µs 30.7 µs ×3.2
decompress_alp[f32, (10000, 0.01, 0.95)] 98.6 µs 31.3 µs ×3.1
decompress_alp[f32, (10000, 0.01, 1.0)] 85.7 µs 32.1 µs ×2.7
decompress_alp[f32, (10000, 0.1, 0.25)] 102.3 µs 34 µs ×3
decompress_alp[f32, (10000, 0.1, 0.95)] 119 µs 40.5 µs ×2.9
decompress_alp[f32, (10000, 0.1, 1.0)] 95.4 µs 40.7 µs ×2.3
decompress_alp[f64, (1000, 0.0, 0.25)] 16.8 µs 11.3 µs +48.84%
decompress_alp[f64, (1000, 0.0, 0.95)] 16.7 µs 11.5 µs +45.48%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Footnotes

  1. No successful run was found on develop (84b2f96) during the generation of this report, so 28f5b3d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 155 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 75.65789% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.94%. Comparing base (28f5b3d) to head (a817fdb).
⚠️ Report is 27 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-array/src/arrays/varbin/vtable/operator.rs 70.64% 32 Missing ⚠️
...tex-array/src/arrays/varbinview/vtable/operator.rs 84.00% 4 Missing ⚠️
...rtex-array/src/arrays/constant/vtable/canonical.rs 0.00% 1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to really look at this module yet, but now that I'm looking at it it seems somewhat strange that we store immutable ByteBuffers inside the BinaryViewVectorMut? Basically this means that we will only ever support append operations, and never support interior mutation, even though this mutable vector is supposed to be "owned".

This is probably showing my ignorance with how our varbin view arrays currently work but its also not clear to me how we correctly garbage collect these buffers? We are storing these views inline in a Buffer, but it isn't clear to me that there is any logic to free this memory since BinaryView does not implement reference counting. I might be misunderstanding something?

@a10y
Copy link
Contributor Author

a10y commented Nov 5, 2025

The current vector is structured for append construction. Once a buffer is closed, it never gets written into again, so using ByteBuffer is right in this case.

As for GC: we currently don't do anything intelligent to size the buffers on construction, or to GC them after operations complete. You can look inside of the existing VarBinViewArray::compact_buffers impl for an example of how you can scan a vector and compact unused space out of it by building a new one

views.push(view);
}

let selection = self.selection.execute()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should do this first? Then do like a selection.threshold_iter() to iterate over indices and slices when constructing the views?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just want iterating indices, since you need to touch every string to build the view so there's no way to make use of slices, but you're right better to avoid building all the views if we don't need to

let views = Buffer::<BinaryView>::from_byte_buffer(views.into_byte_buffer());

match dtype {
// SAFETY: the incoming array has the same validation as the vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't decide whether building vectors from arrays should be unchecked? I would have thought this would be the place we do run validation since in theory the buffers arrive from the disk unchecked.

But I guess we check during array construction now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe things will be different once we're done with all of this, but currently yea all arrays are checked on deserialization from file

@a10y a10y force-pushed the aduffy/bind-vbview branch 2 times, most recently from 59f4272 to 3715cb9 Compare November 6, 2025 12:59
a10y added 4 commits November 6, 2025 07:59
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/bind-vbview branch from 3715cb9 to a817fdb Compare November 6, 2025 13:04
@a10y a10y requested a review from gatesn November 6, 2025 14:27
@gatesn gatesn merged commit 82e9ace into develop Nov 6, 2025
38 of 39 checks passed
@gatesn gatesn deleted the aduffy/bind-vbview branch November 6, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants