Skip to content

Conversation

@Groennbeck
Copy link

Which issue does this PR close?

Closes #.

Rationale for this change

To add support for expression: array_size

What changes are included in this PR?

How are these changes tested?

Tested by adding spark plan that calculated array_size

}
}
let sizes_array = Int32Array::from(builder.finish());
Ok(ColumnarValue::Array(Arc::new(sizes_array)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more efficient way to do this?

@viirya viirya changed the title Fix: add expression array_size feat: add expression array_size Nov 26, 2024
@SemyonSinchenko
Copy link
Member

SemyonSinchenko commented Nov 27, 2024

Why not to use an array_length from the Datafusion instead?

array_length(array, dimension)

Returns the length of the array dimension. array_length([1, 2, 3, 4, 5]) -> 5

It looks like it does the same.

It may be done in a similar way, like for the array_append:
#1072

@Groennbeck
Copy link
Author

Why not to use an array_length from the Datafusion instead?

array_length(array, dimension)

Returns the length of the array dimension. array_length([1, 2, 3, 4, 5]) -> 5

It looks like it does the same.

It may be done in a similar way, like for the array_append: #1072

Good input! will look into it!
Thank you

@viirya
Copy link
Member

viirya commented Nov 27, 2024

Note that DataFusion array_length's return type is UInt64 but Spark's array_size is Int32.

expr.children(2))
None
}
case Size(child, _) if child.dataType.isInstanceOf[ArrayType] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This either needs to handle when legacySizeOfNull is true or pattern match to only false

@Groennbeck
Copy link
Author

apache/datafusion#13600

Have to wait for this to get into the next version

@andygrove
Copy link
Member

apache/datafusion#13600

Have to wait for this to get into the next version

Thanks @Groennbeck. The next DataFusion release will likely be released in the next couple of weeks.

@andygrove
Copy link
Member

@Groennbeck DataFusion 44.0.0 is now released

@Groennbeck
Copy link
Author

@Groennbeck DataFusion 44.0.0 is now released

Nice! will update this later today.

@andygrove
Copy link
Member

Hi @Groennbeck are you still planning on update this PR?

let list_array = as_list_array(&array_value)?;
let mut builder = Int32Array::builder(list_array.len());
for i in 0..list_array.len() {
if list_array.is_null(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to handle legacySizeOfNull here

if list_array.is_null(i) {
builder.append_null();
} else {
builder.append_value(list_array.value_length(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will value_length(i) include nulls in the array at index i?

@Groennbeck
Copy link
Author

Hi @Groennbeck are you still planning on update this PR?

Hey! sorry has been busy. I can have a look this week.

@Groennbeck Groennbeck requested a review from viirya February 23, 2025 20:13
@andygrove
Copy link
Member

I am closing this PR since it is no longer active. Feel free to re-open if needed.

@andygrove andygrove closed this Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants