-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
performanceMake DataFusion fasterMake DataFusion faster
Description
Basically the Unest exec plan could be made faster if we reduced some copies. Here is the basic idea in case anyone wants to do that
// Create an array with the unnested values of the list array, given the list
// array:
//
// [1], null, [2, 3, 4], null, [5, 6]
//
// the result array is:
//
// 1, null, 2, 3, 4, null, 5, 6
//
let unnested_array = unnest_array(list_array)?;
This looks very much the same to me as calling list_array.values() to get access to the underlying values: https://docs.rs/arrow/latest/arrow/array/struct.GenericListArray.html#method.values
In this case the values array would be more like
[1, 2, 3, 4, 5, 6]
And the offsets of the list array would be would be like (I think):
[0, 1, 1, 3, 3, 6]
With a null mask showing the second and fourth element are null
So I was thinking you could calculate the take indices directly from the offsets / nulls without having to copy all the values out of the underlying array
Originally posted by @alamb in #6903 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceMake DataFusion fasterMake DataFusion faster