Skip to content

Change UnionArray constructors#5623

Merged
tustvold merged 5 commits into
apache:masterfrom
mbrobbel:array/union/constructor
May 8, 2024
Merged

Change UnionArray constructors#5623
tustvold merged 5 commits into
apache:masterfrom
mbrobbel:array/union/constructor

Conversation

@mbrobbel
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #5613.

Rationale for this change

It came up in #5585 that the UnionArray constructors are a bit odd when compared to the other array constructors.

What changes are included in this PR?

UnionArray::try_new and UnionArray::new_unchecked

Changed arguments:
From: field_type_ids: &[i8], type_ids: Buffer, value_offsets: Option<Buffer>, child_arrays: Vec<(Field, ArrayRef)>
To: fields: UnionFields, type_ids: ScalarBuffer<i8>, offsets: Option<ScalarBuffer<i32>>, children: Vec<ArrayRef>.

The order of children arrays must match the order of fields in the UnionFields.

The validation in try_new is a bit more strict.

UnionBuilder

It now uses a BTreeMap instead of a HashMap internally to get sorted iteration.

Are there any user-facing changes?

This is a breaking change that changes the UnionArray constructors.

@github-actions github-actions Bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Apr 10, 2024
@tustvold tustvold added the api-change Changes to the arrow API label Apr 11, 2024
@tustvold
Copy link
Copy Markdown
Contributor

In light of #5368 I think I would like to sit on this for a bit, and we can look to get this bundled in when we next make a breaking release

@tustvold
Copy link
Copy Markdown
Contributor

tustvold commented May 8, 2024

I took the liberty of merging up main, making some minor tweaks to avoid hashmaps, and adding some additional tests. I think if you're happy this is then ready to go in (the next release is going to be breaking)

@tustvold tustvold merged commit b25c441 into apache:master May 8, 2024
@mbrobbel
Copy link
Copy Markdown
Member Author

I took the liberty of merging up main, making some minor tweaks to avoid hashmaps, and adding some additional tests. I think if you're happy this is then ready to go in (the next release is going to be breaking)

Why would we want to avoid hashmaps/sets here?

It seems the negative type_id check was dropped.

@mbrobbel mbrobbel deleted the array/union/constructor branch May 13, 2024 09:35
@tustvold
Copy link
Copy Markdown
Contributor

Why would we want to avoid hashmaps/sets here?

They're slower, given there are only 128 possibilities and they'll almost always be clustered towards the lower end, it is faster just to use a vec

It seems the negative type_id check was dropped.

Would you care to expand on this?

@mbrobbel
Copy link
Copy Markdown
Member Author

Why would we want to avoid hashmaps/sets here?

They're slower, given there are only 128 possibilities and they'll almost always be clustered towards the lower end, it is faster just to use a vec

Is there a benchmark? I'm wondering how much slower.

It seems the negative type_id check was dropped.

Would you care to expand on this?

7cd47de#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR191-R199

@tustvold
Copy link
Copy Markdown
Contributor

The check got moved to here - a69cbac#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR209

Is there a benchmark? I'm wondering how much slower.

No, but I would expect it to be quite significant. You are comparing the performance of a pointer indirection to a hashmap lookup. Given this check is done for every type in the array, I would expect it to be quite significant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cleanup UnionArray Constructors

2 participants