-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
proposalIceberg Improvement Proposal (spec/major changes/etc)Iceberg Improvement Proposal (spec/major changes/etc)
Milestone
Description
Proposed Change
We would like to propose to add Variant type to Iceberg data types.
Variant data types allow for the efficient binary encoding of dynamic semi-structured data such as JSON, Avro,Parquet, etc. By encoding semi-structured data as a variant column, we retain the flexibility of the source data, while allowing query engines to more efficiently operate on the data.
With the support of Variant type, such data can be encoded in an efficient binary representation internally for better performance. Without that, we need to parse the data in its format inefficiently.
This will allow the following use cases:
- Create an Iceberg table with a Variant column
CREATE OR REPLACE TABLE car_sales(record Variant); - Insert semi-structured data into the Variant column
INSERT INTO car_sales SELECT PARSE_JSON(<json_string>) - Query against the semi-structured data
SELECT VARIANT_GET(record, '$.dealer.ship', 'string') FROM car_sales
Proposal document
https://docs.google.com/document/d/1sq70XDiWJ2DemWyA5dVB80gKzwi0CWoM0LOWM7VJVd8/edit?tab=t.0
Specifications
- Table
- View
- REST
- Puffin
- Encryption
- Other
ajantha-bhat, ismailsimsek, dramaticlly, amogh-jahagirdar, jslamka-ea and 60 more
Metadata
Metadata
Assignees
Labels
proposalIceberg Improvement Proposal (spec/major changes/etc)Iceberg Improvement Proposal (spec/major changes/etc)