Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-4264: simd selection #4271

Merged
merged 6 commits into from
Mar 7, 2022
Merged

Conversation

platoneko
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

Changelog

  • New Feature
  • Performance Improvement

Related Issues

Fixes #4264

Test Plan

Unit Tests

Stateless Tests

@vercel
Copy link

vercel bot commented Feb 28, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/D3GjDedyAJFZxaNkcigR8m9bUV8t
✅ Preview: https://databend-git-fork-platoneko-simd-select-databend.vercel.app

[Deployment for bd08a7c canceled]

@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added pr-feature this PR introduces a new feature to the codebase pr-performance labels Feb 28, 2022
@platoneko platoneko changed the title simd selection ISSUE-4264: simd selection Feb 28, 2022
@@ -18,6 +18,7 @@

#![feature(generic_associated_types)]
#![feature(trusted_len)]
#![feature(core_intrinsics)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sundy-li
Copy link
Member

Almost Looks good to me.

I just posted a pr in arrow2, you can have a look, but you don't have to follow that pr's behavior.

} else {
while mask != 0 {
let n = std::intrinsics::cttz(mask) as usize;
res.push(values[offset + n]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to use ptr.write to have better performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to use ptr.write to have better performance.

I'd like to have a try

let values = self.values();

const MASK_BITS: usize = 64;
for mut mask in filter.values().chunks::<u64>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should care about the remaining in chunks.

Copy link
Contributor Author

@platoneko platoneko Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked that Bitmap::chunks::<u64>() won't yield chunk less than 64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my unittest, I set N = 1000 to make Column not aligned to 64.

.filter(|(_, f)| *f)
.map(|(v, _)| *v);
{
res.push(v);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing the remaining in chunks is here, maybe not elegant :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not elegant. skip may cause extra iterating.


const CHUNK_SIZE: usize = 64;
let mut chunks = self.values().chunks_exact(CHUNK_SIZE);
let mut mask_chunks = filter.values().chunks::<u64>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a faster path for mask_chunks, if the filter has no offsets, it could use chunks_exact to have better performance. See: https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/filter.rs#L142-L147

There are two approaches:

  1. Use generic dispatch like BitChunkIterExact

  2. if offset > 0, use an extra loop to consume the offset, this is called Header Loops, then we can continue the Main Loops && Tail Loops.

The second one may have better performance because chunks.iter() must merge two u64 values to generate one merged u64, see: merge_reversed in arrow2

@sundy-li
Copy link
Member

sundy-li commented Mar 7, 2022

Seems this is faster by 25%:

Main:


MySQL [(none)]> select count() from numbers_mt(10000000000) where rand()  > 0.90;
+------------+
| count()    |
+------------+
| 1000011915 |
+------------+
1 row in set (1.942 sec)

Now after merge main (#4263):

MySQL [(none)]> select count() from numbers_mt(10000000000) where rand()  > 0.90;
+------------+
| count()    |
+------------+
| 1000013059 |
+------------+
1 row in set (1.486 sec)

@sundy-li
Copy link
Member

sundy-li commented Mar 7, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Mar 7, 2022

update

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@sundy-li
Copy link
Member

sundy-li commented Mar 7, 2022

@platoneko LGTM now. Would you want this pr to be merged? Or maybe you want to continue to add this into BooleanColumn && StringColumn in this pr?

@sundy-li
Copy link
Member

sundy-li commented Mar 7, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Mar 7, 2022

update

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@platoneko
Copy link
Contributor Author

I want this pr to be merged :)

@platoneko platoneko marked this pull request as ready for review March 7, 2022 17:04
@sundy-li sundy-li merged commit 3a66496 into databendlabs:main Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Simd Selection of column filter
3 participants