Skip to content

add iceberg datafusion integration #2075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 6, 2025

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Jun 8, 2025

Rationale for this change

  • Added pyiceberg table integration so that pyiceberg Table can be pass in directly to datafusion's register_table_provider
  • Added datafusion as a optional dependency
  • Added docs for the integration:
Screenshot 2025-07-06 at 10 59 44 AM

Are these changes tested?

Yes

Are there any user-facing changes?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff 🙌 should we also add a section to the docs?

@@ -1428,6 +1428,51 @@ def to_polars(self) -> pl.LazyFrame:

return pl.scan_iceberg(self)

def __datafusion_table_provider__(self) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __datafusion_table_provider__(self) -> Any:
def __datafusion_table_provider__(self) -> IcebergDataFusionTable:

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/datafusion-python branch from 6e1bce9 to fd7e87e Compare July 6, 2025 18:10
@kevinjqliu kevinjqliu merged commit ecc5218 into apache:main Jul 6, 2025
11 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/datafusion-python branch July 6, 2025 19:37
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
- Added pyiceberg table integration so that pyiceberg `Table` can be
pass in directly to datafusion's `register_table_provider`
- Added `datafusion` as a optional dependency
- Added docs for the integration:


<img width="1279" alt="Screenshot 2025-07-06 at 10 59 44 AM"
src="https://github.com/user-attachments/assets/f41f08e6-dd41-4012-ad96-2eaae805d28e"
/>


# Are these changes tested?
Yes

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants