Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guide to type narrowing #1798

Merged
merged 8 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ Type System Guides
writing_stubs
modernizing
unreachable
type_narrowing
typing_anti_pitch
324 changes: 324 additions & 0 deletions docs/source/type_narrowing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,324 @@
**************
Type Narrowing
**************

Python programs often contain values that can take on multiple types and
that are distinguished by a conditional check at runtime. For example, here
the variable *name* can be either a ``str`` or ``None``, and the
``if name is not None`` narrows it down to just ``str``::
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

def maybe_greet(name: str | None) -> None:
if name is not None:
print("Hello, " + name)

This technique is called *type narrowing*.
In order to avoid false positives on such code, type checkers understand
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
various kinds of conditionals that are used to narrow types in Python code.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
The exact set of type narrowing constructs that a type checker understands
is not specified and varies across type checkers. Commonly understood type
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
patterns include:

* ``if x is not None``
* ``if x``
* ``if isinstance(x, SomeType)``
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
* ``if callable(x)``

In addition to narrowing local variables, type checkers usually also support
narrowing instance attributes and sequence members, such as
``if x.some_attribute is not None`` or ``if x[0] is not None``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably not mention this. I believe there are many situations where other type checkers will happily understand an instance attribute or sequence/dictionary lookup as being narrowed, but Pyre will not, because it cannot safely determine that the stored value has not been reset to a new value by some other function in between two accesses of that value. Moreover, I believe even the more limited narrowing that Pyre allows is not strictly type-safe if you have multithreaded code.

In the name of practicality, I personally prefer the approach mypy and pyright have taken here. But it's an area in which type checkers differ, and mypy's/pyright's behaviour isn't strictly safe. So I think it's perhaps better to leave it unsaid here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyre seems to do type narrowing here too: https://pyre-check.org/play?input=%23%20pyre-strict%0A%0Afrom%20typing%20import%20*%0A%0Afrom%20dataclasses%20import%20dataclass%0A%0A%40dataclass%0Aclass%20X%3A%0A%20%20%20%20a%3A%20int%20%7C%20None%0A%0Adef%20f(x%3A%20X)%20-%3E%20None%3A%0A%20%20%20%20if%20x.a%20is%20not%20None%3A%0A%20%20%20%20%20%20%20%20reveal_type(x.a)%0A%20%20%20%20%20%20%20%20print(x.a%20%2B%201)%0A%20%20%20%20print(x.a%20%2B%201)

I'm not sure why this shouldn't be mentioned. It's something users commonly need and ask for. While it is somewhat more obviously unsafe than some other narrowing patterns, there's few kinds of type narrowing that are 100% safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyre seems to do type narrowing here too:

Correct, but if you insert a single intermediate function call, it decides that it can no longer assume that the attribute wasn't modified in between the two accesses by the intermediate function call, and no longer does the type narrowing: https://pyre-check.org/play?input=%23%20pyre-strict%0A%0Afrom%20typing%20import%20*%0A%0Afrom%20dataclasses%20import%20dataclass%0A%0A%40dataclass%0Aclass%20X%3A%0A%20%20%20%20a%3A%20int%20%7C%20None%0A%0Adef%20do_something_else()%20-%3E%20None%3A%0A%20%20%20%20pass%0A%0Adef%20f(x%3A%20X)%20-%3E%20None%3A%0A%20%20%20%20if%20x.a%20is%20not%20None%3A%0A%20%20%20%20%20%20%20%20do_something_else()%0A%20%20%20%20%20%20%20%20reveal_type(x.a)%0A%20%20%20%20%20%20%20%20print(x.a%20%2B%201)%0A%20%20%20%20print(x.a%20%2B%201)

I'm not sure why this shouldn't be mentioned.

Well, I just worry about documenting behaviour here that isn't standardised between all major type checkers. It might be pretty confusing for users of type checkers with differing behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text is already pretty clear that the exact type narrowing behaviors differ among type checkers, but I'll add another line to clarify that.

I think it's important to mention this behavior because it's a common pattern in practice, so people are likely to run into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think the extra line definitely helps


Consult your type checker's documentation for more information on the type
narrowing constructs it supports.

The type system also includes two ways to create *user-defined* type narrowing
functions: :py:data:`typing.TypeIs` and :py:data:`typing.TypeGuard`. These
are useful if you want to reuse a more complicated check in multiple places, or
you use a check that the type checker doesn't understand. In these cases, you
can define a ``TypeIs`` or ``TypeGuard`` function to perform the check and allow type checkers
to use it to narrow the type of a variable. Among the two, ``TypeIs`` usually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for only two items, I would usually prefer "Between" over "Among"

Suggested change
to use it to narrow the type of a variable. Among the two, ``TypeIs`` usually
to use it to narrow the type of a variable. Between the two, ``TypeIs`` usually

has the more intuitive behavior, so we'll talk about it more; see
:ref:`below <guide-type-narrowing-typeis-typeguard>` for a comparison.

How to use ``TypeIs`` and ``TypeGuard``
---------------------------------------

A ``TypeIs`` function takes a single argument and is annotated as returning
``TypeIs[T]``, where ``T`` is the type that you want to narrow to. The function
must return ``True`` if the argument is of type ``T``, and ``False`` otherwise.
The function can then be used in ``if`` checks, just like you would use ``isinstance()``.
For example::

from typing import Literal, TypeIs

type Direction = Literal["N", "E", "S", "W"]

def is_direction(x: str) -> TypeIs[Direction]:
return x in {"N", "E", "S", "W"}

def maybe_direction(x: str) -> None:
if is_direction(x):
print(f"{x} is a cardinal direction")
else:
print(f"{x} is not a cardinal direction")

A ``TypeGuard`` function looks similar and is used in the same way, but the
type narrowing behavior is different, as dicussed in :ref:`the section below <guide-type-narrowing-typeis-typeguard>`.

Depending on the version of Python you are running, you will be able to
import ``TypeIs`` and ``TypeGuard`` either from the standard library :py:mod:`typing`
module or from the third-party ``typing_extension`` module:
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

* ``TypeIs`` is in ``typing`` starting from Python 3.13 and in ``typing_extension``
starting from version 4.10.0.
* ``TypeGuard`` is in ``typing`` starting from Python 3.10 and in ``typing_extension``
starting from version 3.10.0.0.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


Writing a correct ``TypeIs`` function
-------------------------------------

A ``TypeIs`` function allows you to override your type checker's type narrowing
behavior. This is a powerful tool, but it can be dangerous because an incorrectly
written ``TypeIs`` function can lead to unsound type checking, and type checkers
cannot detect such errors.

For a function returning ``TypeIs[T]`` to be correct, it must return ``True`` if and only if
the argument is compatible with type ``T``, and ``False`` otherwise. If this condition is
not met, the type checker may infer incorrect types.

Below are some examples of correct and incorrect ``TypeIs`` functions::

from typing import TypeIs

# Correct
def good_typeis(x: object) -> TypeIs[int]:
return isinstance(x, int)

# Incorrect: does not return True for all ints
def bad_typeis1(x: object) -> TypeIs[int]:
return isinstance(x, int) and x > 0

# Incorrect: returns True for some non-ints
def bad_typeis2(x: object) -> TypeIs[int]:
return isinstance(x, (int, float))
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

This function demonstrates some errors that can occur when using a poorly written
``TypeIs`` function. These errors are not detected by type checkers::

def caller(x: int | str, y: int | float) -> None:
if bad_typeis1(x): # narrowed to int
print(x + 1)
else: # narrowed to str (incorrectly)
print("Hello " + x) # runtime error if x is a negative int

if bad_typeis2(y): # narrowed to int
# Because of the incorrect TypeIs, this branch is taken at runtime if
# y is a float.
print(y.bit_count()) # runtime error: this method exists only on int, not float
else: # narrowed to float (though never executed at runtime)
pass

Here is an example of a correct ``TypeIs`` function for a more complicated type::

from typing import TypedDict, TypeIs

class Point(TypedDict):
x: int
y: int

def is_point(x: object) -> TypeIs[Point]:
return (
isinstance(x, dict)
and all(isinstance(key, str) for key in x)
and "x" in x
and "y" in x
and isinstance(x["x"], int)
and isinstance(x["y"], int)
)
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

.. _`guide-type-narrowing-typeis-typeguard`:

``TypeIs`` and ``TypeGuard``
----------------------------

:py:data:`typing.TypeIs` and :py:data:`typing.TypeGuard` are both tools for narrowing the type of a variable
based on a user-defined function. Both can be used to annotate functions that take an
argument and return a boolean depending on whether the input argument is compatible with
the narrowed type. These function can then be used in ``if`` checks to narrow the type
of a variable.

``TypeIs`` usually has the most intuitive behavior, but it
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
introduces more restrictions. ``TypeGuard`` is the right tool to use if:

* You want to narrow to a type that is not compatible with the input type, for example
from ``list[object]`` to ``list[int]``. ``TypeIs`` only allows narrowing between
compatible types.
* Your function does not return ``True`` for all input values that are compatible with
the narrowed type. For example, you could have a ``TypeGuard[int]`` that returns ``True``
only for positive integers.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

``TypeIs`` and ``TypeGuard`` differ in the following ways:

* ``TypeIs`` requires the narrowed type to be a subtype of the input type, while
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``TypeGuard`` does not.
* When a ``TypeGuard`` function returns ``True``, type checkers narrow the type of the
variable to exactly the ``TypeGuard`` type. When a ``TypeIs`` function returns ``True``,
type checkers can infer a more precise type combining the previously known type of the
variable with the ``TypeIs`` type. (Technically, this is known as an intersection type.)
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
* When a ``TypeGuard`` function returns ``False``, type checkers cannot narrow the type of
the variable at all. When a ``TypeIs`` function returns ``False``, type checkers can narrow
the type of the variable to exclude the ``TypeIs`` type.

This behavior can be seen in the following example::

from typing import TypeGuard, TypeIs, reveal_type, final

class Base: ...
class Child(Base): ...
@final
class Unrelated: ...

def is_base_typeguard(x: object) -> TypeGuard[Base]:
return isinstance(x, Base)

def is_base_typeis(x: object) -> TypeIs[Base]:
return isinstance(x, Base)

def use_typeguard(x: Child | Unrelated) -> None:
if is_base_typeguard(x):
reveal_type(x) # Base
else:
reveal_type(x) # Child | Unrelated

def use_typeis(x: Child | Unrelated) -> None:
if is_base_typeis(x):
reveal_type(x) # Child
else:
reveal_type(x) # Unrelated


Safety and soundness
--------------------

While type narrowing is important for typing real-world Python code, many
forms of type narrowing are unsafe in the presence of mutability. Type checkers
attempt to limit type narrowing in a way that minimizes unsafety while remaining
useful, but not all safety violations can be detected.

Incorrect ``TypeIs`` and ``TypeGuard`` functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Both ``TypeIs`` and ``TypeGuard`` rely on the user writing a function that
returns whether an object is of a particular type. However, the type checker
does not validate whether the function actually behaves as expected. If it
does not, the type checker's narrowing behavior will not match what happens
at runtime.::

from typing import TypeIs

def is_str(x: object) -> TypeIs[str]:
return True

def takes_str_or_int(x: str | int) -> None:
if is_str(x):
print(x + " is a string") # runtime error

To avoid this problem, every ``TypeIs`` and ``TypeGuard`` function should be
carefully reviewed and tested.

Unsound ``TypeGuard`` narrowing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Unlike ``TypeIs``, ``TypeGuard`` can narrow to a type that is not a subtype of the
original type. This allows for unsafe behavior with invariant data structures::

from typing import Any, TypeGuard

def is_int_list(x: list[Any]) -> TypeGuard[list[int]]:
return all(isinstance(i, int) for i in x)

def maybe_mutate_list(x: list[Any]) -> None:
if is_int_list(x):
x.append(0) # OK, x is narrowed to list[int]

def takes_bool_list(x: list[bool]) -> None:
maybe_mutate_list(x)
reveal_type(x) # list[bool]
assert all(isinstance(i, bool) for i in x) # fails at runtime

takes_bool_list([True, False])

To avoid this problem, use ``TypeIs`` instead of ``TypeGuard`` where possible.
If you must use ``TypeGuard``, avoid narrowing across incompatible types.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

Invalidated assumptions
~~~~~~~~~~~~~~~~~~~~~~~

One category of safety issues relates to the fact that type narrowing relies
on a condition that was established at one point in the code and is then relied
on later: we first check ``if x is not None``, then rely on ``x`` not being ``None``.
However, in the meantime other code may have run (for example, in another thread,
another coroutine, or simply some code that was invoked by a function call) and
invalidated the earlier condition.

Such problems are most likely when narrowing is performed on elements of mutable
objects, but it is possible to construct unsafe examples even using only narrowing
of local variables::

def maybe_greet(name: str | None) -> None:
def set_it_to_none():
nonlocal name
name = None

if name is not None:
set_it_to_none()
# fails at runtime, no error in current type checkers
print("Hello " + name)

maybe_greet("Guido")

A more realistic example might involve multiple coroutines mutating a list::

import asyncio
from typing import Sequence, TypeIs

def is_int_sequence(x: Sequence[object]) -> TypeIs[Sequence[int]]:
return all(isinstance(i, int) for i in x)

async def takes_seq(x: Sequence[int | None]):
if is_int_sequence(x):
await asyncio.sleep(2)
print("The total is", sum(x)) # fails at runtime

async def takes_list(x: list[int | None]):
t = asyncio.create_task(takes_seq(x))
await asyncio.sleep(1)
x.append(None)
await t

if __name__ == "__main__":
lst: list[int | None] = [1, 2, 3]
asyncio.run(takes_list(lst))

These issues unfortunately cannot be fully detected by the current
Python type system. (An example of a different programming language that
does solve this problem is Rust, which uses a system called
`ownership <https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html>`__.)
To avoid such issues, avoid using type narrowing on objects that are mutated
from other parts of the code.


See also
--------

* Type checker documentation on type narrowing

* `Mypy <https://mypy.readthedocs.io/en/stable/type_narrowing.html>`__
* `Pyright <https://microsoft.github.io/pyright/#/type-concepts-advanced?id=type-narrowing>`__

* PEPs related to type narrowing. These contain additional discussion
and motivation for current type checker behaviors.

* :pep:`647` (introduced ``TypeGuard``)
* (*withdrawn*) :pep:`724` (proposed change to ``TypeGuard`` behavior)
* :pep:`742` (introduced ``TypeIs``)