Skip to content

Parquet: derive boundary order when writing columns #5074

@Jefffrey

Description

@Jefffrey

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

See

// TODO: calc the order for all pages in this column
boundary_order: BoundaryOrder,

From parquet thrift:

https://github.com/apache/parquet-format/blob/46cc3a0647d301bb9579ca8dd2cc356caf2a72d2/src/main/thrift/parquet.thrift#L982-L988

  /**
   * Stores whether both min_values and max_values are ordered and if so, in
   * which direction. This allows readers to perform binary searches in both
   * lists. Readers cannot assume that max_values[i] <= min_values[i+1], even
   * if the lists are ordered.
   */
  4: required BoundaryOrder boundary_order

Describe the solution you'd like

Be able to set this boundary order when writing parquet files

Describe alternatives you've considered

Additional context

Some additional reading/discussion;

https://github.com/apache/parquet-format/blob/master/PageIndex.md

#5003 (comment)

  • Need to ensure for special types like Float16 where sort order differs from its physical type representation that this is accounted for & tested

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions