Skip to content

Proposal: Make generic types non-classes. #468

Closed
@ilevkivskyi

Description

@ilevkivskyi

It is proposed to add special methods __subclass_base__ and __class_getitem__ to CPython, these will allow making generics non-classes thus simplifying them and significantly improving their performance.
@gvanrossum, the main question now is should this be a PEP?

Motivation:

There are three main points of motivation: performance of typing module, metaclass conflicts, and large amount of hacks currently used in typing.

Performance:

The typing module is one of the heaviest and slowest modules in stdlib even with all the optimizations made. Mainly this is because subscripted generics are classes. See also #432. The three main ways how the performance will be improved:

  • Creation of generic classes is slow since the GenericMeta.__new__ is very slow, we will not need it anymore.

  • Very long MROs for generic classes will be twice shorter, they are present because we duplicate the collections.abc inheritance chain in typing.

  • Time of instantiation of generic classes will be improved (this is minor however).

Metaclass conflicts:

All generic types are instances of GenericMeta, so if a user uses a custom metaclass, it is hard to make a corresponding class generic. This is in particular hard for library classes, that a user doesn't control. A workaround is to always mix-in GenericMeta:

class AdHocMeta(GenericMeta, LibraryMeta):
    pass

class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
    ...

but this is not always practical or even possible.

Hacks and bugs that will be removed by this proposal:

  • _generic_new hack that exists since __init__ is not called on instances with a type differing form the type whose __new__ was called, C[int]().__class__ is C.

  • _next_in_mro speed hack will be not necessary since subscription will not create new classes.

  • Ugly sys._getframe hack, this one is particularly nasty, since it looks like we can't remove it without changes outside typing.

  • Currently generics do "dangerous" things with private ABC caches to fix large memory consumption that grows at least as O(N**2), see Optimize ABC caches #383. This point is also important because I would like to re-implement ABCMeta in C. This will allow to reduce Python start-up time and also start-up times for many programs that extensively use ABCs. My implementation passes all tests except test_typing, because I want to make _abc_cache etc. read-only, so that one can't do something like MyABC._abc_cache = "Surprise when updating caches!")

  • Problems with sharing attributes between subscripted generics, see Subscripted generic classes should not have independent class variables #392. Current solution already uses __getattr__ and __setattr__, but it is still incomplete, and solving this without the current proposal will be hard and will need __getattribute__.

  • _no_slots_copy hack, where we clean-up the class dictionary on every subscription thus allowing generics with __slots__.

  • General complexity of typing module, the new proposal will not only allow to remove the above mentioned hacks/bugs, but also simplify the implementation, so that it will be easier to maintain.

Details of the proposal:

New methods API:

  • Idea of __class_getitem__ is very simple, it is an exact analog of __getitem__ with an exception that it is called on a class that defines it, not on its instances, this allows us to avoid GenericMeta.__getitem__.

  • If an object that is not a class object appears in bases of a class definition, the __subclass_base__ is searched on it. If found, it is given an original tuple of bases as an argument. If the result of call is not None, then it is substituted instead of this object. Otherwise, the base is just removed. This is necessary to avoid inconsistent MRO errors, that are currently prevented by manipulations in GnericMeta.__new__. After creating the class, original bases are saved in __orig_bases__ (now this is also done by the metaclass).

Changes necessary in typing module:

Key point is instead of GenericMeta metaclass, we will have GenericAlias class.

Generic will have:

  • a __class_getitem__ that will return instances of GenericAlias which keep track of the original class and type arguments.
  • __init_subclass__ that will properly initialize the subclasses, and perform necessary bookkeeping.

GenericAlias will have:

  • a normal __getitem__ so that it can be further subscripted thus preserving the current API.
  • __call__, __getattr__, and __setattr__ that will simply pass everything to the original class object.
  • __subclass_base__ that will return the original class (or None in some special cases).

The generic versions of collections.abc classes will be simple subclasses like this:

class Sequence(collections.abc.Sequence, Generic[T_co]):
    pass

(typeshed of course will track that Sequence[T_co] inherits from Iterable[T_co] etc.)

Transition plan:

  • Merge the changes into CPython (ideally before the end of September).
  • Branch a separate version of typing for Python 3.7 and simplify it by removing backward compatibility hacks.
  • Update the 3.7 version to use the dedicated CPython API (this might be done in few separate PRs).

Backwards compatibility and impact on users who don't use typing:

This proposal will allow to have practically 100% backwards compatibility with current public typing API. Actually the whole idea of introducing two special methods appeared form the desire to preserve backwards compatibility while solving the above listed problems.
The only two exceptions that I see now are that currently issubclass(List[int], List) returns True, with this proposal it will raise TypeError. Also issubclass(collections.abc.Iterable, typing.Iterable) will return False, which is actually good I think, since currently we have a (virtual) inheritance cycle between them.

With my implementation, see https://github.com/ilevkivskyi/cpython/pull/2/files, I measured negligible effects (under 1%) for regular (non-generic) classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions