Skip to content

[Python] Type checking support #32609

@asfimport

Description

@asfimport

mypy and static type checking

As of Python3.6, it has been possible to introduce typing information in the code. This became immensely popular in a short period of time. Shortly after, the tool mypy arrived and this has become the industry standard for static type checking inside Python. It is able to check very quickly for invalid types which makes it possible to serve as a pre-commit. It has raised many bugs that I did not see myself and has been a very valuable tool.

Now what does this mean for PyArrow?

When we run mypy on code that uses PyArrow, you will get error message as follows:

some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing "pyarrow.fs": module is installed, but missing library stubs or py.typed marker

More information is available here: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker

You can solve this in three ways:

  1. Ignore the message. This, however, will put all types from PyArrow to Any, making it unable to find user errors with the PyArrow library

  2. Create a Python stub file. This is what previously used to be the standard, however, it no longer a popular option. This is because stubs are extra, next to the source code, while you can also inline the code with type hints, which brings me to our third option.

  3. Create a py.typed file and use inline type hints. This is the most popular option today because it requires no extra files (except for the py.typed file), allows all the type hints to be with the code (like now in the documentation) and not only provides your users but also the developers of the library themselves with type hints (and hinting of issues inside your IDE).

     

    My personal opinion already shines through the options, it is 3 as this has shortly become the industry standard since the introduction.

    What should we do?

    I'd very much like to work on this, however, I don't feel like wasting time. Therefore, I am raising this ticket to see if this had been considered before or if we just didn't get to this yet.

    I'd like to open the discussion here:

  4. Do you agree with number ARROW-10: Fix mismatch of javadoc names and method parameters #3 as type hints.

  5. Should we remove the documentation annotations for the type hints given they will be inside the functions? Or should we keep it and specify it in the code? Which would make it double.

     

Reporter: Jorrick Sleijster

Note: This issue was originally created as ARROW-17335. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions