Skip to content

Variant Data Type Support #10392

@sfc-gh-aixu

Description

@sfc-gh-aixu

Proposed Change

We would like to propose to add Variant type to Iceberg data types.

Variant data types allow for the efficient binary encoding of dynamic semi-structured data such as JSON, Avro,Parquet, etc. By encoding semi-structured data as a variant column, we retain the flexibility of the source data, while allowing query engines to more efficiently operate on the data.

With the support of Variant type, such data can be encoded in an efficient binary representation internally for better performance. Without that, we need to parse the data in its format inefficiently.

This will allow the following use cases:

  • Create an Iceberg table with a Variant column
    CREATE OR REPLACE TABLE car_sales(record Variant);
  • Insert semi-structured data into the Variant column
    INSERT INTO car_sales SELECT PARSE_JSON(<json_string>)
  • Query against the semi-structured data
    SELECT VARIANT_GET(record, '$.dealer.ship', 'string') FROM car_sales

Proposal document

https://docs.google.com/document/d/1sq70XDiWJ2DemWyA5dVB80gKzwi0CWoM0LOWM7VJVd8/edit?tab=t.0

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalIceberg Improvement Proposal (spec/major changes/etc)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions