Skip to content

[Python] pyarrow missing py.typed marker file #33113

@asfimport

Description

@asfimport

I understand that, in general, pyarrow does not support type hints. However, I think it is still sensible to add a py.typed marker file to the library. Let me demonstrate why,

$ pip install mypy pyarrow 
# test.py
import pyarrow as pa
 
table = pa.Table()
 
reveal_type(table) 
$ mypy test.py
test.py:1: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
test.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
test.py:5: note: Revealed type is "Any"
Found 1 error in 1 file (checked 1 source file) 

Note that mypy identifies table as being an Any type, when obviously it is a {}Table{}. If we include a py.typed file, mypy will be able to make these trivial inferences. The motivating example is this,

@overload
def from_arrow(a: pa.Table) -> DataFrame:
    ...

@overload
def from_arrow(a: pa.Array | pa.ChunkedArray) -> Series:
    ...

def from_arrow(a: pa.Table | pa.Array | pa.ChunkedArray) -> DataFrame | Series:
    pass 

The problem is that since all of {}pa.Table{}, {}pa.Array{}, and pa.ChunkedArray are determined to be {}Any{}, the overloads effectively become 

@overload
def from_arrow(a: Any) -> DataFrame:
    ...

@overload
def from_arrow(a: Any) -> Series:
    ... 

and mypy complains that overload 2 is covered entirely by overload 1.

 

I tried to test what adding a py.typed file would do, but I ran into compilation issues. I was hoping someone with a little more experience here could quickly test this out for me :)

Reporter: Matteo Santamaria

Note: This issue was originally created as ARROW-17901. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions