Skip to content

Commit b7d3519

Browse files
authored
docs: add apache iceberg as datafusion data source (apache#1240)
* add iceberg as data source * fix warning
1 parent d54dc4a commit b7d3519

File tree

1 file changed

+34
-3
lines changed

1 file changed

+34
-3
lines changed

docs/source/user-guide/data-sources.rst

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,41 @@ which can lead to a significant performance difference.
172172
df = ctx.table("my_delta_table")
173173
df.show()
174174
175-
Iceberg
176-
-------
175+
Apache Iceberg
176+
--------------
177177

178-
Coming soon!
178+
DataFusion 45.0.0 and later support the ability to register Apache Iceberg tables as table providers through the Custom Table Provider interface.
179+
180+
This requires either the `pyiceberg <https://pypi.org/project/pyiceberg/>`__ library (>=0.10.0) or the `pyiceberg-core <https://pypi.org/project/pyiceberg-core/>`__ library (>=0.5.0).
181+
182+
* The ``pyiceberg-core`` library exposes Iceberg Rust's implementation of the Custom Table Provider interface as python bindings.
183+
* The ``pyiceberg`` library utilizes the ``pyiceberg-core`` python bindings under the hood and provides a native way for Python users to interact with the DataFusion.
184+
185+
.. code-block:: python
186+
187+
from datafusion import SessionContext
188+
from pyiceberg.catalog import load_catalog
189+
import pyarrow as pa
190+
191+
# Load catalog and create/load a table
192+
catalog = load_catalog("catalog", type="in-memory")
193+
catalog.create_namespace_if_not_exists("default")
194+
195+
# Create some sample data
196+
data = pa.table({"x": [1, 2, 3], "y": [4, 5, 6]})
197+
iceberg_table = catalog.create_table("default.test", schema=data.schema)
198+
iceberg_table.append(data)
199+
200+
# Register the table with DataFusion
201+
ctx = SessionContext()
202+
ctx.register_table_provider("test", iceberg_table)
203+
204+
# Query the table using DataFusion
205+
ctx.table("test").show()
206+
207+
208+
Note that the Datafusion integration rely on features from the `Iceberg Rust <https://github.com/apache/iceberg-rust/>`_ implementation instead of the `PyIceberg <https://github.com/apache/iceberg-python/>`_ implementation.
209+
Features that are available in PyIceberg but not yet in Iceberg Rust will not be available when using DataFusion.
179210

180211
Custom Table Provider
181212
---------------------

0 commit comments

Comments
 (0)