Skip to content

Potentially improve join performance by implementing a version of the take kernel that accepts an iterator of indices #13620

Open
@alihan-synnada

Description

@alihan-synnada

Is your feature request related to a problem or challenge?

Collecting the filtered indices to a PrimitiveArray takes a lot of memory and time. Using an Iterator based design instead would save a lot of intermediate memory and potentially speed up the join operation due to fewer cache misses and less copying.

Describe the solution you'd like

Create a version of arrow's compute::take kernel that accepts an iterator of indices. Benchmark and figure out where it's worth using Iterators over PrimitiveArrays.

Describe alternatives you've considered

No response

Additional context

I have tried to create a POC but I seem to get different results whenever I benchmark it and I couldn't figure out what's wrong. I also had lots of trouble with taking the ownership of the values array of PrimitiveArrays which are guaranteed to not have nulls (namely the indices cache and the mask produced by the filter).

Furthermore, it's possible to use the mask's .values().set_indices() iterator to generate indices to be used in the join, because the left-right index pairs are a function of the index of the mask in the form of (index % left_batch.num_rows(), index / left_batch.num_rows()) and it vectorizes nicely.

I plan to share the entirety of my POC (here's a potential implementation of take_with_iter) and benchmarks once I can tame them and generate the same results deterministically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions