Skip to content
This repository was archived by the owner on Feb 22, 2023. It is now read-only.
This repository was archived by the owner on Feb 22, 2023. It is now read-only.

Return PyArrow Scalars instead of Python objects #219

@samgd

Description

@samgd

We have a use case where there are large (TiB) tables on disk. These tables are loaded via a memory map to avoid reading them all into memory.

Individual elements in certain columns in these table can be non-trivially large (MiB-GiB++). e.g. long lists of values. Typically only small slices of these lists are actually needed at runtime. Currently fletcher forces fetched elements to Python objects thus the entire memory-mapped list is fetched into memory and then returned to the user for slicing.

Is it possible for the ExtensionArray implementation to instead return the PyArrow [List]Scalar object thereby enabling efficient slicing without forcing the entire object into memory?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions