Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Import and Export #449

Open
paulcichonski opened this issue Sep 30, 2024 · 3 comments
Open

Iceberg Import and Export #449

paulcichonski opened this issue Sep 30, 2024 · 3 comments

Comments

@paulcichonski
Copy link
Contributor

Apache Iceberg is an Open Table Format (OTF) that can be used for storing data in object storage with a well defined schema.

The Iceberg spec defines its own data schema representation (see Table Spec). Tha spec also defines a json serialization of the schema that can be used for schema management in the iceberg tables. Using the Iceberg json schema definition it is possible to call Iceberg Catalog APIs (popular implementations include Glue, Hive, REST, Dynamo, Sql) to create tables or to evolve the schema on existing tables.

This issue is to request that the datacontract tool support an Iceberg schema export that adheres to the above json encoding. This way, models defined in a datacontract yml could be used for managing Iceberg tables. For example, the cli command could look like:

datacontract export --model orders --format iceberg https://datacontract.com/examples/orders-latest/datacontract.yaml > orders.json

Ideally it could support exporting all models at once, but the current Iceberg json definition seems to support one model per json document, so the above could be a good start.

@simonharrer
Copy link
Contributor

I second this. Feel free to create a pull request for this! :-)

@paulcichonski
Copy link
Contributor Author

Thanks, I'll try to get one put together either this week or next week.

@simonharrer
Copy link
Contributor

It might be a good idea to also build an import along the way because then you can have roundtrip tests (import - export - assert-unchanged, export - import - assert-unchanged) that cover a lot of functionality with only a few tests.

@paulcichonski paulcichonski changed the title Iceberg Export Iceberg Import and Export Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants