Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jul 31, 2023

Which issue does this PR close?

N/A

Rationale for this change

I personally find ListArray confusing both for myself and to explain to others. Recently I made some pictures while reviewing and discussing with @izveigor in DataFusion apache/datafusion#6936 (comment) and want to add them to the docs so I can keep referring to them

I also think it would be nice to be able to use ListArray without having to refer to the Arrow spec

What changes are included in this PR?

Add more docs and examples for ListArray and OffsetsBuffer

With these changes I think someone has a plausible chance of understanding how to use ListArray without having to refer to the Arrow spec

Screenshot 2023-07-31 at 3 15 47 PM Screenshot 2023-07-31 at 3 15 50 PM

Are there any user-facing changes?

Better docs

No API changes

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 31, 2023
Copy link
Contributor

@izveigor izveigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think it is essential information for new arrow-rs users.

alamb and others added 6 commits July 31, 2023 15:53
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
alamb and others added 4 commits August 1, 2023 06:36
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2023

I'll merge this later in the week if there are no more comments

cc @jayzhan211 @smiklos and @vincev who may also be interested

@smiklos
Copy link
Contributor

smiklos commented Aug 1, 2023

Would it make sense to also document FixedSizeListArray ? It seems to store values differently than GenericListArray

@jayzhan211
Copy link
Contributor

Would it make sense to also document FixedSizeListArray ? It seems to store values differently than GenericListArray

FixedSizeListArray stores value the same as ListArray but each row has the same size of array.

@tustvold
Copy link
Contributor

tustvold commented Aug 1, 2023

FixedSizeListArray stores value the same as ListArray but each row has the same size of array.

This isn't entirely accurate, there is no set of offsets which means that all slots have the same width. In particular this includes nulls, which can cause some interesting challenges.

/// ┌─────────────┐
/// │ [A,B,C] │
/// ├─────────────┤
/// │ [] (empty) │
Copy link
Contributor

@jayzhan211 jayzhan211 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing to me.
(empty) is just the explanation to []? Showing that [] is empty array?

Copy link
Contributor

@jayzhan211 jayzhan211 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just leaving [] might be better. Maybe an explanation in the comment!

Copy link
Contributor Author

@alamb alamb Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea -- I will make that change.

Edit: Done in 8e6a56d

@smiklos
Copy link
Contributor

smiklos commented Aug 1, 2023

FixedSizeListArray stores value the same as ListArray but each row has the same size of array.

This isn't entirely accurate, there is no set of offsets which means that all slots have the same width. In particular this includes nulls, which can cause some interesting challenges.

Based on the following array

[[1, 2], Null, [3,4]]

GenericListArray::values will look like this

[1, 2, 3]

But FixedSizeArray::values would look like this

[1, 2, Null, Null, 3, 4]

@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2023

Would it make sense to also document FixedSizeListArray ? It seems to store values differently than GenericListArray

Yes it absolutely would make sense. If you have time that would be great. Otherwise I'll try and make a diagram tomorrow

/// let mut builder = ListBuilder::new(values_builder);
///
/// // [A, B, C]
/// builder.values().append_value("A");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that perhaps the example doc of using the builder pattern should go to GenericListBuilder and here we could use the from or try_new_from_array_data to construct the array directly from values array and nulls.

FixedSizeListArray ist documented like this, so I'm in between adding builder based code there or keep the two separated. We could just link to the builder based approach from Generic/Fixed list array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn on this, as keeping the example (that constructs the data in the diagram) I thought added value. However I think your idea of adding a link to the approach would be good.

I implemented this change in 151564d

@smiklos
Copy link
Contributor

smiklos commented Aug 2, 2023

Would it make sense to also document FixedSizeListArray ? It seems to store values differently than GenericListArray

Yes it absolutely would make sense. If you have time that would be great. Otherwise I'll try and make a diagram tomorrow

I added this pr, largely based on the art in this pr.

alamb and others added 2 commits August 2, 2023 12:47
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
@tustvold tustvold merged commit 08e4692 into apache:master Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants