-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Feature Request / Improvement
We use ORC file format to store our iceberg tables on azure storage.
Currently PyIceberg supports parquet format but not ORC.
This is a request to have ORC file format support in PyIceberg.
tbl.location()
---> [18](vscode-notebook-cell:/c%3A/Projects/pandas_snowflake/notebooks/pyiceberg_test.ipynb#W0sZmlsZQ%3D%3D?line=17) tbl.scan().to_pandas()
File C:\Projects\incubator-iceberg\python\pyiceberg\table\__init__.py:409, in DataScan.to_pandas(self, **kwargs)
408 def to_pandas(self, **kwargs: Any) -> pd.DataFrame:
--> 409 return self.to_arrow().to_pandas(**kwargs)
File C:\Projects\incubator-iceberg\python\pyiceberg\table\__init__.py:404, in DataScan.to_arrow(self)
401 def to_arrow(self) -> pa.Table:
402 from pyiceberg.io.pyarrow import project_table
--> 404 return project_table(
405 self.plan_files(), self.table, self.row_filter, self.projection(), case_sensitive=self.case_sensitive
406 )
File C:\Projects\incubator-iceberg\python\pyiceberg\io\pyarrow.py:558, in project_table(tasks, table, row_filter, projected_schema, case_sensitive)
551 projected_field_ids = {
552 id for id in projected_schema.field_ids if not isinstance(projected_schema.find_type(id), (MapType, ListType))
553 }.union(extract_field_ids(bound_row_filter))
555 with ThreadPool() as pool:
556 tables = [
557 table
...
File c:\Python\Python39\lib\site-packages\pyarrow\_parquet.pyx:1227, in pyarrow._parquet.ParquetReader.open()
File c:\Python\Python39\lib\site-packages\pyarrow\error.pxi:100, in pyarrow.lib.check_status()
ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
Query engine
None