Description
It is proposed to add special methods __subclass_base__
and __class_getitem__
to CPython, these will allow making generics non-classes thus simplifying them and significantly improving their performance.
@gvanrossum, the main question now is should this be a PEP?
Motivation:
There are three main points of motivation: performance of typing
module, metaclass conflicts, and large amount of hacks currently used in typing
.
Performance:
The typing
module is one of the heaviest and slowest modules in stdlib even with all the optimizations made. Mainly this is because subscripted generics are classes. See also #432. The three main ways how the performance will be improved:
-
Creation of generic classes is slow since the
GenericMeta.__new__
is very slow, we will not need it anymore. -
Very long MROs for generic classes will be twice shorter, they are present because we duplicate the
collections.abc
inheritance chain intyping
. -
Time of instantiation of generic classes will be improved (this is minor however).
Metaclass conflicts:
All generic types are instances of GenericMeta
, so if a user uses a custom metaclass, it is hard to make a corresponding class generic. This is in particular hard for library classes, that a user doesn't control. A workaround is to always mix-in GenericMeta
:
class AdHocMeta(GenericMeta, LibraryMeta):
pass
class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
...
but this is not always practical or even possible.
Hacks and bugs that will be removed by this proposal:
-
_generic_new
hack that exists since__init__
is not called on instances with a type differing form the type whose__new__
was called,C[int]().__class__ is C
. -
_next_in_mro
speed hack will be not necessary since subscription will not create new classes. -
Ugly
sys._getframe
hack, this one is particularly nasty, since it looks like we can't remove it without changes outsidetyping
. -
Currently generics do "dangerous" things with private ABC caches to fix large memory consumption that grows at least as
O(N**2)
, see Optimize ABC caches #383. This point is also important because I would like to re-implementABCMeta
in C. This will allow to reduce Python start-up time and also start-up times for many programs that extensively use ABCs. My implementation passes all tests excepttest_typing
, because I want to make_abc_cache
etc. read-only, so that one can't do something likeMyABC._abc_cache = "Surprise when updating caches!"
) -
Problems with sharing attributes between subscripted generics, see Subscripted generic classes should not have independent class variables #392. Current solution already uses
__getattr__
and__setattr__
, but it is still incomplete, and solving this without the current proposal will be hard and will need__getattribute__
. -
_no_slots_copy
hack, where we clean-up the class dictionary on every subscription thus allowing generics with__slots__
. -
General complexity of
typing
module, the new proposal will not only allow to remove the above mentioned hacks/bugs, but also simplify the implementation, so that it will be easier to maintain.
Details of the proposal:
New methods API:
-
Idea of
__class_getitem__
is very simple, it is an exact analog of__getitem__
with an exception that it is called on a class that defines it, not on its instances, this allows us to avoidGenericMeta.__getitem__
. -
If an object that is not a class object appears in bases of a class definition, the
__subclass_base__
is searched on it. If found, it is given an original tuple of bases as an argument. If the result of call is notNone
, then it is substituted instead of this object. Otherwise, the base is just removed. This is necessary to avoid inconsistent MRO errors, that are currently prevented by manipulations inGnericMeta.__new__
. After creating the class, original bases are saved in__orig_bases__
(now this is also done by the metaclass).
Changes necessary in typing
module:
Key point is instead of GenericMeta
metaclass, we will have GenericAlias
class.
Generic
will have:
- a
__class_getitem__
that will return instances ofGenericAlias
which keep track of the original class and type arguments. __init_subclass__
that will properly initialize the subclasses, and perform necessary bookkeeping.
GenericAlias
will have:
- a normal
__getitem__
so that it can be further subscripted thus preserving the current API. __call__
,__getattr__
, and__setattr__
that will simply pass everything to the original class object.__subclass_base__
that will return the original class (orNone
in some special cases).
The generic versions of collections.abc
classes will be simple subclasses like this:
class Sequence(collections.abc.Sequence, Generic[T_co]):
pass
(typeshed
of course will track that Sequence[T_co]
inherits from Iterable[T_co]
etc.)
Transition plan:
- Merge the changes into CPython (ideally before the end of September).
- Branch a separate version of
typing
for Python 3.7 and simplify it by removing backward compatibility hacks. - Update the 3.7 version to use the dedicated CPython API (this might be done in few separate PRs).
Backwards compatibility and impact on users who don't use typing
:
This proposal will allow to have practically 100% backwards compatibility with current public typing API. Actually the whole idea of introducing two special methods appeared form the desire to preserve backwards compatibility while solving the above listed problems.
The only two exceptions that I see now are that currently issubclass(List[int], List)
returns True
, with this proposal it will raise TypeError
. Also issubclass(collections.abc.Iterable, typing.Iterable)
will return False
, which is actually good I think, since currently we have a (virtual) inheritance cycle between them.
With my implementation, see https://github.com/ilevkivskyi/cpython/pull/2/files, I measured negligible effects (under 1%) for regular (non-generic) classes.