-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-4264: simd selection #4271
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/databend/databend/D3GjDedyAJFZxaNkcigR8m9bUV8t [Deployment for bd08a7c canceled] |
Thanks for the contribution! Please review the labels and make any necessary changes. |
6aed4f4
to
e599767
Compare
common/datavalues/src/lib.rs
Outdated
@@ -18,6 +18,7 @@ | |||
|
|||
#![feature(generic_associated_types)] | |||
#![feature(trusted_len)] | |||
#![feature(core_intrinsics)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Almost Looks good to me. I just posted a pr in arrow2, you can have a look, but you don't have to follow that pr's behavior. |
} else { | ||
while mask != 0 { | ||
let n = std::intrinsics::cttz(mask) as usize; | ||
res.push(values[offset + n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to use ptr.write
to have better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to use
ptr.write
to have better performance.
I'd like to have a try
let values = self.values(); | ||
|
||
const MASK_BITS: usize = 64; | ||
for mut mask in filter.values().chunks::<u64>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should care about the remaining in chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked that Bitmap::chunks::<u64>()
won't yield chunk less than 64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my unittest, I set N = 1000 to make Column not aligned to 64.
.filter(|(_, f)| *f) | ||
.map(|(v, _)| *v); | ||
{ | ||
res.push(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processing the remaining in chunks is here, maybe not elegant :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not elegant. skip
may cause extra iterating.
|
||
const CHUNK_SIZE: usize = 64; | ||
let mut chunks = self.values().chunks_exact(CHUNK_SIZE); | ||
let mut mask_chunks = filter.values().chunks::<u64>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a faster path for mask_chunks
, if the filter has no offsets, it could use chunks_exact
to have better performance. See: https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/filter.rs#L142-L147
There are two approaches:
-
Use generic dispatch like
BitChunkIterExact
-
if offset > 0, use an extra loop to consume the offset, this is called
Header Loops
, then we can continue theMain Loops
&&Tail Loops
.
The second one may have better performance because chunks.iter()
must merge two u64 values to generate one merged u64, see: merge_reversed in arrow2
Seems this is faster by 25%: Main:
Now after merge main (#4263):
|
@mergify update |
✅ Branch has been successfully updatedHey, I reacted but my real name is @Mergifyio |
@platoneko LGTM now. Would you want this pr to be merged? Or maybe you want to continue to add this into |
@mergify update |
✅ Branch has been successfully updatedHey, I reacted but my real name is @Mergifyio |
I want this pr to be merged :) |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Summary about this PR
Changelog
Related Issues
Fixes #4264
Test Plan
Unit Tests
Stateless Tests