-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add more docs and examples for ListArray and OffsetsBuffer #4607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
izveigor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think it is essential information for new arrow-rs users.
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
…o alamb/list_array_docs
|
I'll merge this later in the week if there are no more comments cc @jayzhan211 @smiklos and @vincev who may also be interested |
|
Would it make sense to also document |
FixedSizeListArray stores value the same as ListArray but each row has the same size of array. |
This isn't entirely accurate, there is no set of offsets which means that all slots have the same width. In particular this includes nulls, which can cause some interesting challenges. |
arrow-array/src/array/list_array.rs
Outdated
| /// ┌─────────────┐ | ||
| /// │ [A,B,C] │ | ||
| /// ├─────────────┤ | ||
| /// │ [] (empty) │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing to me.
(empty) is just the explanation to []? Showing that [] is empty array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just leaving [] might be better. Maybe an explanation in the comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea -- I will make that change.
Edit: Done in 8e6a56d
Based on the following array GenericListArray::values will look like this But FixedSizeArray::values would look like this |
Yes it absolutely would make sense. If you have time that would be great. Otherwise I'll try and make a diagram tomorrow |
arrow-array/src/array/list_array.rs
Outdated
| /// let mut builder = ListBuilder::new(values_builder); | ||
| /// | ||
| /// // [A, B, C] | ||
| /// builder.values().append_value("A"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that perhaps the example doc of using the builder pattern should go to GenericListBuilder and here we could use the from or try_new_from_array_data to construct the array directly from values array and nulls.
FixedSizeListArray ist documented like this, so I'm in between adding builder based code there or keep the two separated. We could just link to the builder based approach from Generic/Fixed list array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was torn on this, as keeping the example (that constructs the data in the diagram) I thought added value. However I think your idea of adding a link to the approach would be good.
I implemented this change in 151564d
I added this pr, largely based on the art in this pr. |
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Which issue does this PR close?
N/A
Rationale for this change
I personally find
ListArrayconfusing both for myself and to explain to others. Recently I made some pictures while reviewing and discussing with @izveigor in DataFusion apache/datafusion#6936 (comment) and want to add them to the docs so I can keep referring to themI also think it would be nice to be able to use ListArray without having to refer to the Arrow spec
What changes are included in this PR?
Add more docs and examples for ListArray and OffsetsBuffer
With these changes I think someone has a plausible chance of understanding how to use
ListArraywithout having to refer to the Arrow specAre there any user-facing changes?
Better docs
No API changes