[Variant] Test and implement efficient building for "large" Arrays

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**


The Variant spec uses different numbers of bytes for encoding / writing small/large arrays For example, for an array, the encoding looks like this (note the num_elements is either 1 or 4 bytes): https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-array-basic_type3

> The size in bytes of `num_elements` is indicated by `is_large` in the value_header.

Likewise, the number of bytes used for a field_offset depends on the total number of elements in the array

```
                   7                     0
                  +-----------------------+
array value_data  |                       |
                  :     num_elements      :  <-- unsigned little-endian, 1 or 4 bytes
                  |                       |
                  +-----------------------+
                  |                       |
                  :     field_offset      :  <-- unsigned little-endian, `field_offset_size` bytes
                  |                       |
                  +-----------------------+
                              :
                  +-----------------------+
                  |                       |
                  :     field_offset      :  <-- unsigned little-endian, `field_offset_size` bytes
                  |                       |      (`num_elements + 1` field_offsets)
                  +-----------------------+
                  |                       |
                  :         value         :
                  |                       |
                  +-----------------------+
                              :
                  +-----------------------+
                  |                       |
                  :         value         :  <-- (`num_elements` values)
                  |                       |
                  +-----------------------+
```

As described by @scovich on @PinkCrow007's PR: https://github.com/apache/arrow-rs/pull/7653#discussion_r2147173482

> he value offset and field id arrays require either knowing the number of elements/fields to be created in advance (and then worrying about what happens if the caller builds too many/few entries afterward), or building the arrays in separate storage and then moving an arbitrarily large number of buffered bytes to make room for the them after the fact.

A similar issue exists for Objects. Hopefully by designing a pattern for Arrays we'll then also have a way to implement it for `Objects` as well

**Describe the solution you'd like**
I would like 
1. Examples of creating Arrays with more than 256 values (the number of offsets that can be encoded in a u8)
2. APIs that allow efficient construction of such Array values

**Describe alternatives you've considered**

Maybe the builder can leave room for the list length and then append the values, and then go back and update the length when the list is finished. This would get tricky for building "large" lists as the length field may not be known upfront.

## Specialized Functions
we could also introduce potentially a function like `new_large_object()` or something for callers to hint up front their object has many fields, and if they use new_object but push too many values fallback to copying

I think many clients would have knowledge of the number of fields and could then decide on the appropriate API

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Variant] Test and implement efficient building for "large" Arrays #7699

Specialized Functions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Variant] Test and implement efficient building for "large" Arrays #7699

Description

Specialized Functions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions