Optimize Unnest and implement skip_nulls=true if specified#7371
Optimize Unnest and implement skip_nulls=true if specified#7371alamb merged 7 commits intoapache:mainfrom
Unnest and implement skip_nulls=true if specified#7371Conversation
alamb
left a comment
There was a problem hiding this comment.
This is really nicely done -- thank you @smiklos 🏆
cc @vincev and @izveigor and @jayzhan211 who I think where also interested in this API
| ) -> Result<Arc<dyn Array + 'static>> { | ||
| let values = list_array.values(); | ||
| if list_array.null_count() == 0 || !options.preserve_nulls { | ||
| Ok(values.clone()) |
| let take_index = | ||
| <Int32Type as ArrowPrimitiveType>::Native::from_usize( | ||
| take_offset + i, | ||
| ) | ||
| .unwrap(); | ||
| builder.append_value(take_index); |
There was a problem hiding this comment.
This might be easier to understand
| let take_index = | |
| <Int32Type as ArrowPrimitiveType>::Native::from_usize( | |
| take_offset + i, | |
| ) | |
| .unwrap(); | |
| builder.append_value(take_index); | |
| let take_index = take_offset + i; | |
| builder.append_value(take_index as i32); |
| "+------+----+", | ||
| "| 1 | A |", | ||
| "| 2 | A |", | ||
| "| | B |", // this row should not be here |
Unest and implement skip_nulls=true if specified
|
@alamb Thanks for the review! Do we want to set |
Setting |
I recommend that we merge this PR in without any changes to the default behavior and then we can propose a follow on PR with just the default behavior change (for SQL) by itself for discussion. That will make it easier to follow what exactly the proposed change in behavior is. |
Unest and implement skip_nulls=true if specifiedUnnest and implement skip_nulls=true if specified
Which issue does this PR close?
related to #7087.
Closes #6961
Rationale for this change
This implements the new preserve_nulls option in unnest exec.
I've also adjusted the code to avoid copying and rely only on the take kernel as needed.
Todo
What changes are included in this PR?
The code will now respect the flag
preserve_nullsinUnnestOptions.This way we can also make the sql api consistent with Clickhouse/duckdb while allowing dataframe users to choose between implementations.
Are these changes tested?
Yes, added some new tests as well.
Are there any user-facing changes?
No
No