Skip to content

Conversation

@paddyhoran
Copy link
Contributor

This PR adds explicit SIMD for boolean ops, and, or and not.

I moved array_ops into the new compute module. From the outside this module serves the same purpose as the previous array_ops module (all kernels will be accessible from this namespace) and the remaining array_ops implementations are exposed via the compute module currently. As I add explicit SIMD for more kernels they will migrate from array_ops into their own modules under compute. I am keeping sub-modules under compute (as apposed to compute.rs) as SIMD can get rather verbose and it seems that compute may expand in the future.

I have included benchmarks where I re-create the old default implementations for others to take a look at the speed improvement. It's not clear whether we need the non-SIMD versions in the benchmarks long term but I left them in for now to make the non-SIMD/SIMD comparison.

There are likely more optimizations possible (processing the values and null bit buffers in a single loop for instance) but I wanted to get the cleanest impl first and add further optimizations later if needed.

@paddyhoran
Copy link
Contributor Author

pub mod bitmap;
pub mod buffer;
pub mod builder;
pub mod compute;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chao mentioned that this isn't necessary as we have lib.rs, I've removed it in #3624

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wasn't sure about this as I remembered @sunchao's previous comment. We have a lib.rs and a mod.rs what's the use case for each? For instance, why is it a good idea to have csv in both and not compute (not pushing for compute to be added but rather wondering what the thinking should be here)?

I tried searching for answers to this but couldn't find anything definitive...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib is required, and I think mod is optional if you prefer declaring top-level modules separately to lib. As we declare pub mod array and others in lib, mod becomes redundant.

I think modules that are only declared in lib were an oversight by their author(s).

Copy link
Contributor Author

@paddyhoran paddyhoran Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks like mod.rs can be removed. It's not clear to me that it actually gets compiled because no other files reference it, lib.rs references the other modules directly. datafusion and parquet do not have a mod.rs.

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @paddyhoran

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Nice improvements to error handling as well. Thanks @paddyhoran

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants