-
Notifications
You must be signed in to change notification settings - Fork 3.8k
ARROW-271: Update Field structure to be more explicit #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,17 +91,31 @@ union Type { | |
JSONScalar | ||
} | ||
|
||
/// ---------------------------------------------------------------------- | ||
/// The possible types of a vector | ||
|
||
enum VectorType: short { | ||
/// used in List type Dense Union and variable length primitive types (String, Binary) | ||
/// used in List type, Dense Union and variable length primitive types (String, Binary) | ||
OFFSET, | ||
/// fixed length primitive values | ||
VALUES, | ||
/// Bit vector indicated if each value is null | ||
/// actual data, either wixed width primitive types in slots or variable width delimited by an OFFSET vector | ||
DATA, | ||
/// Bit vector indicating if each value is null | ||
VALIDITY, | ||
/// Type vector used in Union type | ||
TYPE | ||
} | ||
|
||
/// ---------------------------------------------------------------------- | ||
/// represents the physical layout of a buffer | ||
/// buffers have fixed width slots of a given type | ||
|
||
table VectorLayout { | ||
/// the width of a slot in the buffer (typically 1, 8, 16, 32 or 64) | ||
bit_width: short; | ||
/// the purpose of the vector | ||
type: VectorType; | ||
} | ||
|
||
/// ---------------------------------------------------------------------- | ||
/// A field represents a named column in a record / row batch or child of a | ||
/// nested type. | ||
|
@@ -120,10 +134,10 @@ table Field { | |
dictionary: long; | ||
// children apply only to Nested data types like Struct, List and Union | ||
children: [Field]; | ||
/// the buffers produced for this type (as derived from the Type) | ||
/// layout of buffers produced for this type (as derived from the Type) | ||
/// does not include children | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "does not include children" mean? I would expect this to list all buffers for a batch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is
|
||
/// each recordbatch will return instances of those Buffers. | ||
buffers: [ VectorType ]; | ||
layout: [ VectorLayout ]; | ||
} | ||
|
||
/// ---------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the bit width be for a data vector for strings? I'm not entirely clear what this means in all cases (or how it would be used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 since that's how many bits you have in between offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is more useful in cases where the bit_width is less definitive: