Integer generics #3345

msegado · 2017-05-09T19:41:19Z

Dependent types have been brought up before in a few places: #366, this python-ideas thread, and most recently, #3062. The latter limits itself to one specific feature of a dependent type system, namely, support for simple literal types.

Another limited feature which could prove helpful is support for integer generics —that is, the ability to parametrize types by integers as well as other types. For example, the length of a homogeneous tuple could be parametrized:

# Currently supported:
Tuple[int, int, int] # A tuple of 3 ints
Tuple[int, ...] # A tuple of any number of ints

# Proposed:
Tuple[int, 256] # A tuple of 256 ints

One real-world use-case for this came up in python/typing#221:

# Currently supported:
SboxType = Tuple[
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
    Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
]

# Proposed:
SboxType = Tuple[Tuple[int, 16], 8] # A tuple of 8 tuples of 16 ints each

Integer genericity could also be used for things like statically checking vector or matrix dimensions:

N = TypeVar('N', numeric=True) # one idea for denoting type-level integers

class Vector(Generic[N]):
    def __init__(self, values: Iterable[float, N]) -> None:
        self.value = tuple(values)

    def __add__(self, other: Vector[N]) -> Vector[N]:
        return Vector(a + b for a, b in zip(self.values, other.values))

# ...more code...

def matmul(a: Matrix[N, M], b: Matrix[M, D]) -> Matrix[N, D]: ...

Note that the Rust community has been discussing this for some time (as well as the more general const generics) and has a lot of material collected on the motivation, challenges, and details of such a feature:

Pre-RFC: Integer Templating
Parameterize types over numerics / type level integers rust-lang/rfcs#1038: Parameterize types over numerics / type level integers
RFC: Constant generics (restricted Π types) for Rust, core RFC (v2) rust-lang/rfcs#1931: RFC: Constant generics (restricted Π types) for Rust, core RFC (v2)
[lang-team-minutes] Const generics

Whether or not it's a good fit for Python is an open question, of course, but I figured I should at least bring it up for discussion! (Especially since it could interact with #3062 in interesting ways.)

(Even nicer would be to support simple type-level expressions so we could do things like this...

# Vector concatenation (relevant: http://stackoverflow.com/q/43292197/1974654)
def concat(a: Vector[N], b: Vector[M]) -> Vector[N+M]: ...

# The `zip` builtin (based on the typeshed stub)
@overload
def zip(iter1: Iterable[_T1, N1],
        iter2: Iterable[_T2, N2]) -> Iterator[Tuple[_T1, _T2], min(N1, N2)]: ...

# HDL-like bit vector operations
def add(a: BitVec[N], b: BitVec[M]) -> BitVec[max(N, M) + 1]: ...

...but that may be taking it too far. To avoid running arbitrary user code the expression language would have to be explicitly defined and simple enough to implement in languages other than Python; it may be feasible to support something like +, -, *, //, %, **, max(), min(), and abs(), though.)

ilevkivskyi · 2017-05-10T09:58:24Z

Thank you for interesting proposal and for research! I really like the idea of supporting fixed size arrays/tuples/matrices. I have seen many people already playing with combining mypy and numpy, but in my experience many errors in numpy are array shape mismatches, something like:

ValueError: operands could not be broadcast together with shapes (1000,) (999,)

or

ValueError: operands could not be broadcast together with shapes (250, 10) (10, 250)

The point is that in most cases the array shapes/sizes are known statically, so that all these errors can be caught by simple integer dependent types.

JukkaL · 2017-05-10T12:59:40Z

Tuples could easily be special cased in mypy so they aren't enough justification by themselves for this feature. I agree with @ilevkivskyi that numpy could be an interesting use case. The mypy core team currently is not planning to work on numpy support, however -- though I understand that many potential users could find it useful.

ilevkivskyi · 2017-05-10T21:18:23Z

A possible strategy would be so start from a relatively small project (if/when someone will have time for this). For example have some special support for numpy.ndarray (or maybe have a special mypy_extensions.ndarray type). We don't need to support user defined integer dependent types. (Similar to how we have variadic types Calable and Tuple, but we still don't allow user defined variadic generics.)

PaperclipBadger · 2017-08-29T14:11:58Z

As for numpy so for tensorflow. It'd be disheartening to find out that special support for numpy.ndarray allowed you to catch shape-mismatch errors at compile time, but that support was not available for tensorflow.Tensor, which has near-identical semantics.

saulshanabrook · 2018-07-02T13:10:53Z

One way to think about multidimensional arrays are as a function from indices to a value. The "shape" really just constrains the possible inputs to the functions. I am playing with implementing the Mathematics of Arrays in Python and I can represent an array object as a tuple of an indexing function and a shape tuple. But this doesn't capture the relationship between these two parts at the type level.

If we also want to track the inner types in the arrays, this requires a generic array object. So at the type system, I want to be able to represent an array object of of type T and shape S as "A callable with integer args <= S and output type T".

saulshanabrook · 2018-07-02T13:30:05Z

It looks like this could be accomplished with generic protocols, like this:

T = TypeVar('T')

class Array(Protocol[T]):
    def index(self, *ix: int) -> T: ...
    shape: List[int]

An alternative way of doing this is to just use my own type annotations that can't be typed statically and just use them at runtime. This is what the Relay project does, which is part of the TVM ML stack from U of Washington:

@relay_model
def lenet(x: Tensor[Float, (1, 28, 28)]) -> Tensor[Float, 10]:
    conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1], no_bias=False)
    tanh1 = relay.tanh(conv1)
    pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1])
    conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1], no_bias=False) tanh2 = relay.tanh(conv2)
    pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2)
    fc1 = relay.linear(flatten, num_hidden=500)
    tanh3 = relay.tanh(fc1)
    return relay.linear(tanh3, num_hidden=10)

@relay
def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) -> Float:
    return relay.softmax_cross_entropy(lenet(x), y)

@relay
def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) -> Model:
    model = relay.create_model(lenet)
    for x, y in data:
        model_grad = relay.grad(model, loss, (x, y))
        relay.update_model_params(model, model_grad) return relay.export_model(model)
    training_data, test_data = relay.datasets.mnist() model = train_lenet(training_data) 
    print(relay.argmax(model(test_data[0])))

They just disable static type checking for those types.

This is a bit of a tangent, but I attended some of PLDI this year and there was a ton of talk there about the Python type system and language. Lots of people were building these computational frameworks in Python for probabilistic programming or deep learning and were coming from other languages, likes Haskell, Rust, or Java. I am not that familiar with the community there, but it seems like it would be good if some of that energy could be focused on research of Python's core so that these types of extensions to the language could be better supported (have reasonable error messages, static type checking, etc).

gvanrossum · 2018-07-02T18:30:04Z

OT: please don't use images of text. There's no reason you couldn't have copy-pasted the original code.

saulshanabrook · 2018-07-02T18:34:41Z

Duly noted. I updated the post with code instead of the image.

msegado · 2018-07-03T17:09:45Z

It looks like this could be accomplished with generic protocols

Could you explain a bit more about how this would work? I'm afraid I don't see how generic protocols could handle shape/size constraints at the type level... out-of-bounds checking would still need to be handled at runtime in the index method, no? (In contrast, integer generics would allow something much closer to what the Relay project is doing.)

leonardo-m · 2018-07-30T21:54:43Z

For Rust the accepted RFC is:
rust-lang/rfcs#2000
And further improvements:
rust-lang/rust#76560

Currently a typing problem above could be handled like this:

T16 = Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int]
SboxType = Tuple[T16, T16, T16, T16, T16, T16, T16, T16]

graue70 · 2020-05-05T16:00:17Z

Is a case like this considered as well?

def get_ngrams(tokens: Iterable[str], n: int = 2):
    """
    >>> get_ngrams(["my", "name", "is", "archibald"], n=2)
    [('my', 'name'), ('name', 'is'), ('is', 'archibald')]

    """
    grams = zip(*[tokens[i:] for i in range(n)])
    return grams

PyCharm infers Union[Iterator[Tuple[Any]], Iterator[Tuple[Any, Any]], Iterator[Tuple[Any, Any, Any]], Iterator[Tuple[Any, Any, Any, Any]], Iterator[Tuple[Any, Any, Any, Any, Any]]] as the return type (for some reason stopping at five elements).

As it is now, I don't think I can do better than Iterator[Tuple[str, ...]] which leaves out the length information.

fylux · 2020-09-11T14:55:27Z

Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link

gvanrossum · 2020-09-11T16:11:25Z

Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link

Looking forward to it!

leonardo-m · 2020-09-11T17:25:10Z

See also:
https://futhark-lang.org/blog/2020-03-15-futhark-0.15.1-released.html

luphord · 2020-11-25T17:53:18Z

Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link

@fylux Is there a write-up of your proposal available anywhere?

fylux · 2020-11-25T20:55:10Z

@luphord The write-up is not ready yet but the slides of the presentation can give you a good idea: link

leonardo-m · 2020-11-27T08:18:41Z

@luphord I hope eventually some of this stuff will also go to statically typed languages as Rust :-) Rust "const generics" aren't enough.

luphord · 2020-11-27T12:12:01Z

@fylux Oh wow, that is a lot more advanced than what I expected, thanks!

I was merely thinking of simple use cases such as (in "numpy terminology")

import numpy as np

# n, k generic integers
def f(A: np.ndarray[n, k], x: np.ndarray[k]) -> np.ndarray[n]:
    return A @ x

and then mypy would recognize a type error in

A = np.random.random((3,4))
x = np.random.random((5,))
f(A, x) # <- would cause a type error in mypy

However, I can clearly see the benefits of type arithmetic that you presented. Is there a road map for your proposal? I understand that at least parts of it are already implemented in pyre?

fylux · 2020-11-27T14:55:21Z

Regarding support for "simple cases", that is already possible in Pyre thanks to variadics (e.g. XLM example), and there is currently an open proposal for standardising it in Python (https://mail.python.org/archives/list/typing-sig@python.org/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/).

About type arithmetic, +, -, //, *, Length and Product are all already supported in Pyre (e.g MaxPool2D example).
The final proposal for type arithmetic is not yet written, mostly because type arithmetic will highly benefit from having support for variadics, so until the variadic PEP is merged there is not so much sense on discussion a type arithmetic PEP.

shaunc · 2021-04-03T23:40:22Z

Could a integer generic be considered a generic over literals? E.g.

# made up syntax
Int = TypeVar('Int', bound=Literal[int])

class Vector[Int]:
    ...

Vector32 = Vector[32]

It would seem that allowing a generic to be bound to a literal of a type almost avoids introducing new concepts. Everywhere it could be treated as a literal whose exact value wasn't available. Features like "unrolling" tuples, arithmetic, etc, could be implemented in smaller chunks on top of this.

fylux · 2021-04-05T11:58:26Z

Regarding the concept of using generics over literal integers as integer generics it is something that we explored in Pyre, but we faced several problems when dealing with values that are unknown. You could always restrict yourself to values that are known at compile time, but the goal is that known and unknown values can interact in a sound way.

If for example, you use integer generics for dealing with vectors, you will often not know the size of the vector at compile time. Let's say that you have a function to read a CSV where you know the number of columns but not the number of rows, how would you express that unknown number of rows?

# Using shaunc syntax
N = TypeVar('N', bound=Literal[int])
M = TypeVar('M', bound=Literal[int])

class Vector[N, M]: ...

def load(cols: M, filename: string) -> Vector[?, M]: ...

Or similarly, let's say that you want to write the type of a variable calling this function.

table : Vector[?, Literal[3]] = load(3,"foo.csv")

In this example, the "?" must be a literal int according to the previous definition of generic over literals. The number of rows is an exact number, but we don´t know which. Thus, we could say that ? is the union of all possible literal integers, something like "AnyInt", "Literal[int]", etc, but then we would be introducing a new concept in the type system.

If Python could track unbound type variables and resonate on them, then you could write something like:

table : Vector[N, Literal[3]] = load(3,"foo.csv") # Currently: error, unbound type variable N

Based on my experience, I am currently leaning towards relying on the existing definition of int for dealing with integer generics. If we define N as a subtype of int, then it can be replaced by any literal or also by int itself, which in this case would be used to represent that the number is not known at compile time.

N = TypeVar('N', bound=int)
M = TypeVar('M', bound=int)

class Vector[N, M]: ...

def load(cols: M, filename: string) -> Vector[int, M]: ...

table : Vector[int, Literal[3]] = load(3,"foo.csv")

Actually, Pyre uses as a reference this concept of integer generics for type arithmetic, so that "int" represents unknown information and arithmetic on "int" propagates "int".

def concat(x: Vector[N], y: Vector[M]) -> Vector[N+M]: ...
x : Vector[Literal[2]]
y : Vector[int]
concat(x,x) # Vector[Literal[4]]
concat(x,y) # Vector[int]

Finally, @shaunc, I personally agree with the following "It would seem that allowing a generic to be bound to a literal of a type almost avoids introducing new concepts". Currently, I don´t see a good mechanism for setting bounds or subtypes on specific types of literals (not just ints) so I think that it is something that must be explored.

gvanrossum · 2021-04-05T17:49:54Z

To me this looks like the best solution:

table : Vector[int, Literal[3]] = load(3,"foo.csv")

shaunc · 2021-04-05T18:56:05Z

Generics bounded by literals of types also offers a way to introduce default values and the like:

class Base: ...

class Foo(Base):...

Default = TypeVar('Default', bound=Literal[Base])

class BaseUser[Default]:

    use(base: Base = Default): ...

defaultBase = Base(size=100)

baseUserWithDefault = BaseUser[defaultBase]()
foo = Foo()
baseUserWithDefault.use(foo)  # uses foo
baseUserWithDefault.use()  # uses defaultBase

WRT arithmetic: this issue (and others similar) seem to have languished, perhaps because they are heavy lifts. I was hoping "generics bounded by literals" would suggest a path forward that would allow functionality to be added incrementally, while still providing a basis for natural extension to more powerful constructs.

ilevkivskyi added the feature label May 19, 2017

ilevkivskyi mentioned this issue May 28, 2017

Add mypy roadmap #3460

Merged

ilevkivskyi mentioned this issue Jun 14, 2017

Use plugin to support numpy #3540

Closed

ilevkivskyi mentioned this issue Sep 2, 2017

Use PEP 484 type annotations for static type declarations cython/cython#1672

Closed

ilevkivskyi mentioned this issue Oct 4, 2017

Possible Bachelor's thesis? #4044

Closed

ilevkivskyi mentioned this issue Dec 7, 2017

Typing for multi-dimensional arrays python/typing#513

Open

vsiles mentioned this issue Sep 30, 2019

Statically checked tensor shapes pytorch/pytorch#26889

Open

msullivan mentioned this issue Apr 3, 2020

Use of literal values in generic types #8605

Closed

headtr1ck mentioned this issue Oct 12, 2021

Incompatible types for fixed-length list/tuple creation #10606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer generics #3345

Integer generics #3345

msegado commented May 9, 2017 •

edited

Loading

ilevkivskyi commented May 10, 2017

JukkaL commented May 10, 2017

ilevkivskyi commented May 10, 2017

PaperclipBadger commented Aug 29, 2017

saulshanabrook commented Jul 2, 2018

saulshanabrook commented Jul 2, 2018 •

edited

Loading

gvanrossum commented Jul 2, 2018

saulshanabrook commented Jul 2, 2018

msegado commented Jul 3, 2018

leonardo-m commented Jul 30, 2018 •

edited

Loading

graue70 commented May 5, 2020

fylux commented Sep 11, 2020

gvanrossum commented Sep 11, 2020

leonardo-m commented Sep 11, 2020

luphord commented Nov 25, 2020

fylux commented Nov 25, 2020

leonardo-m commented Nov 27, 2020

luphord commented Nov 27, 2020

fylux commented Nov 27, 2020

shaunc commented Apr 3, 2021 •

edited

Loading

fylux commented Apr 5, 2021

gvanrossum commented Apr 5, 2021

shaunc commented Apr 5, 2021

Integer generics #3345

Integer generics #3345

Comments

msegado commented May 9, 2017 • edited Loading

ilevkivskyi commented May 10, 2017

JukkaL commented May 10, 2017

ilevkivskyi commented May 10, 2017

PaperclipBadger commented Aug 29, 2017

saulshanabrook commented Jul 2, 2018

saulshanabrook commented Jul 2, 2018 • edited Loading

gvanrossum commented Jul 2, 2018

saulshanabrook commented Jul 2, 2018

msegado commented Jul 3, 2018

leonardo-m commented Jul 30, 2018 • edited Loading

graue70 commented May 5, 2020

fylux commented Sep 11, 2020

gvanrossum commented Sep 11, 2020

leonardo-m commented Sep 11, 2020

luphord commented Nov 25, 2020

fylux commented Nov 25, 2020

leonardo-m commented Nov 27, 2020

luphord commented Nov 27, 2020

fylux commented Nov 27, 2020

shaunc commented Apr 3, 2021 • edited Loading

fylux commented Apr 5, 2021

gvanrossum commented Apr 5, 2021

shaunc commented Apr 5, 2021

msegado commented May 9, 2017 •

edited

Loading

saulshanabrook commented Jul 2, 2018 •

edited

Loading

leonardo-m commented Jul 30, 2018 •

edited

Loading

shaunc commented Apr 3, 2021 •

edited

Loading