-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Returning any dense union from ScalarUDF currently fails.
To Reproduce
use std::any::Any;
use std::sync::{Arc, OnceLock};
use arrow::array::UnionBuilder;
use arrow::datatypes::{Float64Type, Int32Type};
use arrow_array::Array;
use arrow_schema::{DataType, Field, UnionFields, UnionMode};
use datafusion::logical_expr::{
ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility,
};
#[derive(Debug)]
pub(super) struct UnionExample {
signature: Signature,
}
impl UnionExample {
pub fn new() -> Self {
Self {
signature: Signature::any(0, Volatility::Immutable),
}
}
}
static DOC: OnceLock<Documentation> = OnceLock::new();
impl ScalarUDFImpl for UnionExample {
fn as_any(&self) -> &dyn Any {
self
}
fn name(&self) -> &str {
"example_union"
}
fn signature(&self) -> &Signature {
&self.signature
}
fn return_type(&self, _arg_types: &[DataType]) -> datafusion::error::Result<DataType> {
let fields = UnionFields::new(
vec![0, 1],
vec![
Arc::new(Field::new("a", DataType::Int32, false)),
Arc::new(Field::new("b", DataType::Float64, false)),
],
);
Ok(DataType::Union(fields, UnionMode::Dense))
}
fn invoke(&self, args: &[ColumnarValue]) -> datafusion::error::Result<ColumnarValue> {
todo!()
}
fn invoke_no_args(&self, _number_rows: usize) -> datafusion::error::Result<ColumnarValue> {
let mut builder = UnionBuilder::new_dense();
builder.append::<Int32Type>("a", 1).unwrap();
builder.append::<Float64Type>("b", 3.0).unwrap();
builder.append::<Int32Type>("a", 4).unwrap();
let arr = builder.build().unwrap();
assert_eq!(arr.type_id(0), 0);
assert_eq!(arr.type_id(1), 1);
assert_eq!(arr.type_id(2), 0);
assert_eq!(arr.value_offset(0), 0);
assert_eq!(arr.value_offset(1), 0);
assert_eq!(arr.value_offset(2), 1);
let arr = arr.slice(0, 1);
assert!(matches!(
arr.data_type(),
DataType::Union(_, UnionMode::Dense)
));
Ok(ColumnarValue::Array(Arc::new(arr)))
}
fn documentation(&self) -> Option<&Documentation> {
Some(DOC.get_or_init(|| Documentation::builder().build().unwrap()))
}
}
#[cfg(test)]
mod test {
use super::*;
use datafusion::prelude::*;
#[tokio::test]
async fn test() {
let ctx = SessionContext::new();
ctx.register_udf(UnionExample::new().into());
let out = ctx.sql("SELECT example_union();").await.unwrap();
out.show().await.unwrap();
}
}Gives
called `Result::unwrap()` on an `Err` value:
ArrowError(InvalidArgumentError("column types must match schema types, expected
Union([(0, Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"b\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Dense)
but found
Union([(0, Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"b\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) at column index 0"), None)
The only difference there is that "expected" has a Union type of Dense while "found" has a union type of Sparse. I'm returning dense array data from invoke_no_args and return_type() also returns a dense union. So it seems that internally the union array is being cast from dense to sparse somehow.
Expected behavior
Does not error with dense unions.
Additional context
I need to use a dense union to represent geospatial vector data of unknown geometry type and coordinate dimension. geoarrow/geoarrow#43
alamb
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working