Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 18, 2025

Motivation

Adds Apache Iceberg support analogous to existing Delta Lake implementation, leveraging polars' native write_iceberg and scan_iceberg methods.

Changes

Storage Backend

  • Implemented IcebergStorageBackend in dataframely/_storage/iceberg.py
  • Stores schema metadata in Iceberg table properties (analogous to Delta's commit metadata)
  • Gracefully degrades when metadata operations fail

Public API

  • Added write_iceberg(), scan_iceberg(), read_iceberg() to Schema, Collection, and FailureInfo classes
  • Signature matches existing *_delta() methods

Testing Infrastructure

  • Added IcebergSchemaStorageTester, IcebergCollectionStorageTester, IcebergFailureInfoStorageTester
  • Parametrized existing storage tests to include Iceberg alongside Parquet and Delta

Dependencies

  • Added pyiceberg to optional dependencies in pixi.toml

Usage Example

import dataframely as dy

class MySchema(dy.Schema):
    col = dy.String(nullable=False)

# Write to Iceberg
df = MySchema.sample(10)
MySchema.write_iceberg(df, "path/to/table")

# Read with validation
validated = MySchema.read_iceberg("path/to/table", validation="warn")

Limitations

  • Lazy writes not supported (consistent with Delta Lake)
  • Metadata retrieval requires pyiceberg catalog configuration
Original prompt

This section details on the original issue you should resolve

<issue_title>Feature Request: Support Iceberg</issue_title>
<issue_description>Since deltalake looks to be well supported in dataframely, it might be nice to also have iceberg support.</issue_description>

<agent_instructions>Implement support for iceberg analogously to the existing support for deltalake (dataframely/_storage/delta.py). Wherever possible, IO operations should go through polars, which already has support for iceberg. For testing, implement an IcebergStorageTester in analogy to the existing ParquetStorageTester and DeltaStorageTester. You should not need to write new tests, just add the new tester to the parametrization of the existing tests. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits November 18, 2025 11:36
Co-authored-by: AndreasAlbertQC <103571926+AndreasAlbertQC@users.noreply.github.com>
Co-authored-by: AndreasAlbertQC <103571926+AndreasAlbertQC@users.noreply.github.com>
Co-authored-by: AndreasAlbertQC <103571926+AndreasAlbertQC@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for Iceberg in dataframely Add Iceberg storage backend support Nov 18, 2025
Copilot finished work on behalf of AndreasAlbertQC November 18, 2025 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support Iceberg

2 participants