Skip to content

Improve performance of unnest even more #6961

@alamb

Description

@alamb

Basically the Unest exec plan could be made faster if we reduced some copies. Here is the basic idea in case anyone wants to do that

    // Create an array with the unnested values of the list array, given the list
    // array:
    //
    //   [1], null, [2, 3, 4], null, [5, 6]
    //
    // the result array is:
    //
    //   1, null, 2, 3, 4, null, 5, 6
    //
    let unnested_array = unnest_array(list_array)?;

This looks very much the same to me as calling list_array.values() to get access to the underlying values: https://docs.rs/arrow/latest/arrow/array/struct.GenericListArray.html#method.values

In this case the values array would be more like

[1, 2, 3, 4, 5, 6]

And the offsets of the list array would be would be like (I think):

[0, 1, 1, 3, 3, 6]

With a null mask showing the second and fourth element are null

So I was thinking you could calculate the take indices directly from the offsets / nulls without having to copy all the values out of the underlying array

Originally posted by @alamb in #6903 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions