Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support array_resize #8744

Merged
merged 14 commits into from
Jan 11, 2024
Merged

feat: support array_resize #8744

merged 14 commits into from
Jan 11, 2024

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jan 4, 2024

Which issue does this PR close?

Closes #7194

Rationale for this change

  • Support array_resize
  • Support column for this function
  • Support LargeList

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jan 4, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I have a different idea about array_size.
We can apply the combination of existing array_length, array_slice, array_repeat and array_concat to resize the array. But I'm not sure is it faster or slower, is it easier to maintain or not. How do you think about this approach @Weijun-H ?

array_concat(array_slice(array, 0, min(array_length, count)), array_repeat(default_array, max(count-array_length), 0))


for (index, arr) in array.iter().enumerate() {
let count = count_array.value(index).to_usize().ok_or_else(|| {
DataFusionError::Execution(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exec_err

array: &GenericListArray<O>,
count_array: &Int64Array,
field: &FieldRef,
new_element: Option<ArrayRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_element might be more concise?

package.json Outdated Show resolved Hide resolved
@Weijun-H
Copy link
Member Author

Weijun-H commented Jan 5, 2024

array_concat(array_slice(array, 0, min(array_length, count)), array_repeat(default_array, max(count-array_length), 0))

I think this approach is not a better idea, as supporting the column for max(count-array_length) in an argument is hard. However, the MutableArray could be another direction to consider, just like you did for other functions. I will try it as a follow-up PR. @jayzhan211

@comphead
Copy link
Contributor

comphead commented Jan 7, 2024

thanks @Weijun-H I'll review it tomorrow more closely if no one else beats me.

let new_rows = vec![&default_value; count - remain_count];
rows.extend(new_rows);

let last_offset = offsets.last().copied().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle the potential error on .unwrap

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H tests are awesome.
Im just wondering could we just concat 2 arrays(old+new one with default values) to do a resize?

I have slight concern that performance on current implementation, maybe I'm wrong

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H and @comphead -- this is looking good

datafusion/common/src/utils.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/array_expressions.rs Outdated Show resolved Hide resolved
@jayzhan211

This comment was marked as outdated.


for (row_index, offset_window) in array.offsets().windows(2).enumerate() {
let count = count_array.value(row_index).to_usize().ok_or_else(|| {
exec_datafusion_err!("array_resize: failed to convert size to usize")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does type conversion more like internal_err?

wasn't expected/anticipated by the implementation and that is most likely a bug (the error message even encourages users to open a bug report)

unlike array.len() != number, which is error from user.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better with MutableArray! LGTM

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks for using MutableBuffer @Weijun-H it looks nicer now
Will also wait for @alamb as he had some comments as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @Weijun-H and @comphead

@alamb alamb merged commit 8a0b447 into apache:main Jan 11, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement array_resize function
4 participants