Change UnionArray constructors#5623
Conversation
|
In light of #5368 I think I would like to sit on this for a bit, and we can look to get this bundled in when we next make a breaking release |
|
I took the liberty of merging up main, making some minor tweaks to avoid hashmaps, and adding some additional tests. I think if you're happy this is then ready to go in (the next release is going to be breaking) |
Why would we want to avoid hashmaps/sets here? It seems the negative |
They're slower, given there are only 128 possibilities and they'll almost always be clustered towards the lower end, it is faster just to use a vec
Would you care to expand on this? |
Is there a benchmark? I'm wondering how much slower.
7cd47de#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR191-R199 |
|
The check got moved to here - a69cbac#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR209
No, but I would expect it to be quite significant. You are comparing the performance of a pointer indirection to a hashmap lookup. Given this check is done for every type in the array, I would expect it to be quite significant |
Which issue does this PR close?
Closes #5613.
Rationale for this change
It came up in #5585 that the
UnionArrayconstructors are a bit odd when compared to the other array constructors.What changes are included in this PR?
UnionArray::try_newandUnionArray::new_uncheckedChanged arguments:
From:
field_type_ids: &[i8], type_ids: Buffer, value_offsets: Option<Buffer>, child_arrays: Vec<(Field, ArrayRef)>To:
fields: UnionFields, type_ids: ScalarBuffer<i8>, offsets: Option<ScalarBuffer<i32>>, children: Vec<ArrayRef>.The order of
childrenarrays must match the order of fields in theUnionFields.The validation in
try_newis a bit more strict.UnionBuilderIt now uses a
BTreeMapinstead of aHashMapinternally to get sorted iteration.Are there any user-facing changes?
This is a breaking change that changes the
UnionArrayconstructors.