Skip to content

Reading delta table as pyarrow dataset does not work #131

Closed
@dudzicp

Description

@dudzicp

Describe the bug
I am unable to display the contents of delta tables stored locally

To Reproduce

[tool.poetry.dependencies]
python = "^3.10"
datafusion = "^0.7.0"
deltalake = "^0.6.4"

then run the following code:

import pyarrow as pa
import pyarrow.dataset as ds

from deltalake import DeltaTable
import datafusion

ctx = datafusion.SessionContext()

delta_table = DeltaTable("/local_delta_path/")
pa_dataset = dt.to_pyarrow_dataset()

ctx.register_dataset("pa_dataset", pa_dataset)

tmp = ctx.sql("SELECT * FROM pa_dataset limit 10")
tmp.show()

When executed in notebook in vs code, this script can run for >20 min and I am unable to interrupt the execution.

Expected behavior
Top rows displayed

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions