-
Notifications
You must be signed in to change notification settings - Fork 130
feat: OperatorVTable::bind for VarBin/VarBinView #5212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c7d7a4c to
2cfdd8c
Compare
CodSpeed Performance ReportMerging #5212 will improve performances by ×3.2Comparing Summary
Benchmarks breakdown
Footnotes
|
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
connortsui20
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to really look at this module yet, but now that I'm looking at it it seems somewhat strange that we store immutable ByteBuffers inside the BinaryViewVectorMut? Basically this means that we will only ever support append operations, and never support interior mutation, even though this mutable vector is supposed to be "owned".
This is probably showing my ignorance with how our varbin view arrays currently work but its also not clear to me how we correctly garbage collect these buffers? We are storing these views inline in a Buffer, but it isn't clear to me that there is any logic to free this memory since BinaryView does not implement reference counting. I might be misunderstanding something?
|
The current vector is structured for append construction. Once a buffer is closed, it never gets written into again, so using As for GC: we currently don't do anything intelligent to size the buffers on construction, or to GC them after operations complete. You can look inside of the existing |
| views.push(view); | ||
| } | ||
|
|
||
| let selection = self.selection.execute()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should do this first? Then do like a selection.threshold_iter() to iterate over indices and slices when constructing the views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just want iterating indices, since you need to touch every string to build the view so there's no way to make use of slices, but you're right better to avoid building all the views if we don't need to
| let views = Buffer::<BinaryView>::from_byte_buffer(views.into_byte_buffer()); | ||
|
|
||
| match dtype { | ||
| // SAFETY: the incoming array has the same validation as the vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't decide whether building vectors from arrays should be unchecked? I would have thought this would be the place we do run validation since in theory the buffers arrive from the disk unchecked.
But I guess we check during array construction now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe things will be different once we're done with all of this, but currently yea all arrays are checked on deserialization from file
59f4272 to
3715cb9
Compare
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
3715cb9 to
a817fdb
Compare
No description provided.