Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mypy "strict mode" for static compilation #1862

Closed
datnamer opened this issue Jul 13, 2016 · 49 comments
Closed

Mypy "strict mode" for static compilation #1862

datnamer opened this issue Jul 13, 2016 · 49 comments

Comments

@datnamer
Copy link

datnamer commented Jul 13, 2016

Problem statement:

"Could there be some way to write a library like numpy so that a single codebase could simultaneously target CPython and the newer compilers, while achieving competitive speed in all cases? If so, what would it take to make that happen? If not, then what’s the next-best alternative?"**

Proposed Solution: a "PyIR" that can be consumed by various compilers. Details here: https://docs.google.com/document/d/1jGksgI96LdYQODa9Fca7EttFEGQfNODphVmbCX0DD1k/edit

Question for Mypy: A cython subset seems to be the recommended source format. Can a "strict mode" mypy be used instead to output this IR? Advantages include more expressive than cython (generics etc), bootstrap off mypy work and less fragmentation.

Excerpts from discussion on gitter:

@njsmith

at pycon this year Jukka and the mypy team were very interested in the idea of somehow using their static type stuff in (something like, this doesn't exist) "strict" mode to help with this

@kmod

I think the issue is that "programmer productivity types" fundamentally != "compiler-userful types"

For example, are subclasses subtypes?

If you pick either yes or no, then the type system becomes non-useful to one community or the other

bonus: Where would dynd's datashape play in? If so, can Dynd's datashape be used as a mypy plugin to annotate array types? @insertinterestingnamehere

@gnprice gnprice added this to the Future milestone Jul 14, 2016
@datnamer
Copy link
Author

datnamer commented Jul 21, 2016

This discussion adresses type vs subclass question python/typing#241

Also does the future label mean this is planned for implementation at some point?

@ilevkivskyi
Copy link
Member

I am not sure that this is what you want to discuss here, but at least probably something related.

I have seen a discussion in Cython mailing list some time ago, where the conclusion is that PEP 484 is quite useless for Cython because typing.py does not support close-to-the-machine types like unsigned long.

I don't agree with this, PEP 484 is about some commonly agreed syntax for type information in Python, not about implementation of these types. Currently, Cython uses its own syntax for declaring types, so that Cython code is not a valid Python code. I want to make a translator script, that will take a type annotated Python 3 code and make it Cython code. Of course, it should be accompanied with a stub module (ctyping.pyi?) that contains low-level types like int, unsigned_long, double, etc.

With such a tool one can work with a native Python code, and then try some speed-up by running it in Cython using annotations that are present in the code. These annotations could be checked by mypy, since this is just a native Python.

I think this should be quite simple, since it looks like it is not necessary to go through an AST to perform the translation, it looks like it could be done entirely on the level of lexer/tokenizer. Currently this is on the stage of just an idea, since I don't have time to work on this now, but at some point I will definitely try this.

@datnamer
Copy link
Author

Cython is missing features like generic classes. How would you deal with that?

@ilevkivskyi
Copy link
Member

@datnamer

Although I have not used it yet, there seems to be some support for C++ templates in Cython: http://docs.cython.org/en/latest/src/userguide/wrapping_CPlusPlus.html#templates . There are also fused types (https://github.com/cython/cython/wiki/enhancements-fusedtypes) that are very similar to constrained type variables like T = TypeVar('T', ctyping.float, ctyping.double). I think this is one of the most common use cases for numeric speed critical code.

I am going to ignore all the types that could not be expressed in Cython, at least at early stages. In principle, it is possible to specialize some generics before translation to Cython, but as I see it, it could be quite complex.

@datnamer datnamer closed this as completed Sep 1, 2016
@JukkaL
Copy link
Collaborator

JukkaL commented Sep 2, 2016

@datnamer I forgot about this while I was on vacation. I'm still interested in this topic and have some ideas, though I haven't had time to write anything substantial down.

@gvanrossum
Copy link
Member

OK, let's have it.

@gvanrossum gvanrossum reopened this Sep 2, 2016
@datnamer
Copy link
Author

datnamer commented Sep 2, 2016

Cool :). I think pep 526 can make this much nicer also.

@datnamer
Copy link
Author

datnamer commented Sep 2, 2016

@JukkaL - It's probably a bit (alot) early for this but sometimes better to be forward thinking: but do you think we would be able to statically resolve things like protocols and a potential future multiple dispatch (together or separately), or would this be restricted more like cython?

Also, would this need fixed width integers to be added?

@gour
Copy link

gour commented Nov 21, 2016

Hello,

I'd like to stay with Python instead of using C(++) or going with some JVM language (Kotlin, Ceylon,...) and wonder of this issue is supposes to bring the feature that after performing mypy analysis on the code, one could take advantage of it and get one's code cython-ized automatically?

@refi64
Copy link
Contributor

refi64 commented Nov 21, 2016

@gour You might find this interesting.

@gour
Copy link

gour commented Nov 21, 2016

@kirbyfan64 I know about it, even asked the question about Nuitka & Mypy on ml (got no reply), but, afaict, Nuitka won't take advantage of type annotations, but is going to do its own independent analysis, right?

@datnamer
Copy link
Author

@gour that is my understanding, unless something has changed.

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 22, 2016

As far as I know, Nuitka isn't going to use type annotations. Type analysis without annotations is very difficult for larger programs. I haven't tried Nuitka recently or followed closely what's going on there, but I believe that their approach is kind of hard to pull through, except maybe for smaller programs or smaller performance gains than what I'd hope to see.

Compiling programs with PEP 484 annotations to cython is also not easy to do effectively, since the type systems are quite different. This doesn't mean that PEP 484 annotations can't be used to speed up programs, only that the approach would likely have to somewhat different from what cython does right now.

Compiling programs with PEP 484 annotations to CPython C extension modules seems feasible, and I've done some very preliminary work on it. The compiled programs likely wouldn't have full compatibility with Python semantics. For example, if something is annotated as a list, maybe the compiler would insert a runtime check to ensure that you can't assign a non-list object to the variable. Also, if you call a function, maybe the compiler would (under some circumstances) assume that there is no monkey patching and directly bind to the target function instead of going through a namespace lookup. Cython already can do a bunch of similar things to get good performance, so it wouldn't be anything terribly new.

@datnamer
Copy link
Author

datnamer commented Nov 22, 2016

I've done some very preliminary work on it.

Is that recent work, or was it the work you blogged about surrounding the initial stages of Mypy?

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 22, 2016

Is that recent work, or was it the work you blogged about surrounding the initial stages of Mypy?

This is recent and mostly separate from the earlier work, though there are obvious similarities.

@datnamer
Copy link
Author

Cool. Would it use the full flexibility of the type system, like generics and protocols etc?

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 22, 2016

Cool. Would it use the full flexibility of the type system, like generics and protocols etc?

It's too early say. Likely there would have to be some limitations, but it's unclear what exactly.

@datnamer
Copy link
Author

datnamer commented Dec 5, 2016

@JukkaL

Have you seen Julia's work on AOT compilation? It can retain pretty much the full dynamicity and expressiveness of the language, including generics, abstractly and untyped functions etc while emitting code within the magnitude of C, or matching it.

http://juliacomputing.com/blog/2016/02/09/static-julia.html

The only catch includes very sane things like can't monkey patch attributes etc...however methods can be added using multiple dispatch.

Is this feasible with mypy? It may require things like multiple dispatch for function specialization selection at call time and LLVM stuff.

For something more python-ey, Numba already has this sort of multiple dispatch. However it is under the hood and doesn't have the same sort of generic and expressive type system features of Mypy or Julia. Perhaps there could be synchronicity between the projects..I think @pzwang can say more about whether any ideas are transferable.

@wtpayne
Copy link

wtpayne commented Dec 5, 2016 via email

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 6, 2016

@datnamer Here are examples of things that might be limited (or the use of which may limit performance gains):

  • Monkey patching and adding attributes outside class definition (probably possible to support some use cases, but not everything)
  • Multiple inheritance (C extension classes only support single inheritance, I think, but there are some partial workarounds)
  • Metaclasses (some use cases maybe possible with a compiler plugin architecture, but that could be difficult to build and use)
  • Mock objects in tests (some optimizations may restrict the use of these)
  • Introspection of stack frames
  • eval (obviously)

@datnamer
Copy link
Author

datnamer commented Dec 6, 2016

That all makes sense.

What do you think about generic type safe classes with type vars?

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 6, 2016

Generic type parameters (including for things like List[int]) would be erased at runtime, similar to Java. The compiled form would have to perform runtime type checks. Consider this example:

def f(x: List[int]) -> int:
    a = x[0]
    ...

The compiled form could behave like this:

def f(x: List[int]) -> int:
    _tmp = x[0]
    if not isinstance(_tmp, int):
        raise TypeError
    a = unbox_int(_tmp)
    ...

@datnamer
Copy link
Author

datnamer commented Dec 6, 2016

Thanks this for the example. This would have a runtime cost and the function can't be inlined, right? Or can the branch somehow be eliminated.

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 6, 2016

Which function are you thinking about (regarding inlining)? There would be a runtime cost, but we could use low-level C API calls for the isinstance test and unboxing -- these can be pretty fast. To avoid the runtime type check operations, we'd need special collection types that know about item types, similar to numpy arrays (or Java arrays). It might make sense to have a high-performance custom list-like type that could be used for code that can't afford the runtime type checks. Hypothetical example:

def f(x: FastList[int]) -> int:
    a = x[0]  # No runtime type check needed
    ...

a: Any = FastList[int]()  # let's only do runtime type checking here by using Any
a.append(0)  # ok
f(a)  # ok, runtime check for the argument x passes
a.append('x')  # runtime error 
f(FastList[str]())  # call fails, FastList[str] is not compatible with FastList[int]

However, this likely would likely be a potential post-1.0 feature instead of a core part of the project.

@datnamer
Copy link
Author

datnamer commented Dec 6, 2016

Gotcha thanks. Sorry for all the questions above and below, this all very helpful as I plan out an application.

This cost/check would only be present for functions called at runtime from the interpreter , not compiletime, right?

How about custom data structures from classes- Would we need fixed with ints for attributes? How would this work with structural subtyping, if at all? So do you mean a typvar for a field T be erased and instantiated as an int64 for example?

For inlining, I mean any function I want to use in a loop... my dream is writing my own person class for a simulation which has a immutable stack allocated generic yet typesafe random variable as an attribute that is sampled from in a simulation loop. Or maybe that wouldn't be a good usecase.

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 8, 2016

@seanjensengrey I've looked at Shed Skin before. The approach I'm proposing can be more flexible and should support more Python features and accessing basically arbitrary Python libraries, since we could support dynamically typed values through Any types. Shed Skin expects everything to have a pretty precise type, which makes it hard to use with legacy code that generally doesn't conform to any particular static typing discipline precisely. There are other major differences as well, such as local vs. whole-program type inference, with relatively well-known tradeoffs which I won't discuss in detail now.

@rowillia My understanding is that HPHPc didn't use type annotations to speed up code, but there clearly are other similarities. Also, I have the impression that HPHPc was basically a full reimplementation of PHP, whereas my proposal would still use the normal CPython runtime and libs.

I've briefly looked at FAT Python before. It looks to me that it is doing most/all work at runtime, making it closer to a JIT compiler than what I'm proposing here.

@datnamer
Copy link
Author

datnamer commented Dec 8, 2016

but FAT python looks to make some guarantees with python code using function guards and now the merged dictionary versioning. I think these would make Mypy's job easier, no?

@den-run-ai
Copy link

There is a group of researchers in Tokyo, who work on two-way transpiler from subset of Fortran to type hints with Python 3.5+. They use tools such as this one:

https://github.com/mbdevpl/typed-astunparse

They published a paper at Python HPC:

http://conferences.computer.org/pyhpc/2016/papers/5220a009.pdf

@wtpayne
Copy link

wtpayne commented Dec 19, 2016 via email

@datnamer
Copy link
Author

datnamer commented Jan 4, 2017

@JukkaL https://opensource.googleblog.com/2017/01/grumpy-go-running-python.html

Have you seen that? Looks quite relevant.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 4, 2017

@datnamer Thanks for the link! It looks interesting, especially for organizations that are also heavily investing in Go and don't have a large legacy codebase that would make porting hard.

Their approach seems to have a few major practical implications:

  • They apparently aren't going to support the C extension API, which is going to make it hard to port many large Python applications to run on Grumpy, as it's likely that some dependency will rely on some C extension. The lack of C extension support has been a major problem for PyPy, for example.
  • Their performance chart shows Grumpy having worse single-threaded performance than CPython. Hopefully that will improve over time, though. The mypy-based compiler would be focused on high single-threaded performance, and using multiple cores effectively would require using multiple processes.
  • They are using Go's garbage collector instead reference counting, so object lifetime semantics will differ from CPython. This may make porting existing code that assumes reference counting semantics harder.

@datnamer
Copy link
Author

datnamer commented Jan 4, 2017

Makes sense. But can't there be some kind of Nogil annotation like in cython? Or does cython have some manual memory management that allows such a thing?

@ambv
Copy link
Contributor

ambv commented Jan 4, 2017

With Cython, any operation that involves Python objects and functions must hold the GIL. The "nogil" function annotation and context managers are for code that is purely C. Otherwise Cython will refuse to compile your code. Memory management in "nogil" sections is whatever C/C++ provides at that point.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 4, 2017

It might be possible to support some things without the GIL, but for it to be safe, I think that you'd only be able to use certain low-level types that don't require taking the GIL such as numpy arrays and fixed-width integers, and you wouldn't be able to call most functions. Not sure how useful this would be. (I haven't tried the Cython nogil feature but I've seen it mentioned in the docs.)

@datnamer
Copy link
Author

datnamer commented Jan 5, 2017

From the author of the project, when I suggested a 'dropbox google collaboration ':

"Yes, leveraging type hints for optimization purposes is a long term goal. Thanks for pointing me to [this] issue, I'll keep an eye on it.

One of the goals of open sourcing was to get feedback and work with outside folks so I'm definitely open to collaboration!"

@ethanhs
Copy link
Collaborator

ethanhs commented Jan 5, 2017

A somewhat related project to static compilation is Hermetic. The program takes type annotated Python functions and through Hindley-Milner type deduction generates C code. Sadly H-M type deduction doesn't work well with Python's OOP style, but the project is very impressive work all the same.

@alehander92
Copy link

alehander92 commented Jan 5, 2017

Hey, I am the author of Airtight(the HM thing). Airtight isn't really implementing Python, it is like an experiment in combining Python's syntax and philosophy with functional programming and stronger types systems.

Actually I have another library: pseudo-python that compiles a static subset of Python to readable/idiomatic code in Go/C#/Ruby/JS(C++/Rust in the making) which is more relevant to the discussion. I planned to use mypy type hints when they stabilize (currently it just does a form of full type inference, which is kinda possible because pseudo is used only for self-contained python code without dependencies). However Pseudo is also implementing a limited part of Python, so it's not a great example for PyIR.

Good and standartized type annotations syntax/semantics are still a very nice part of Python because they make it suitable for writing all kinds of specialized transpilers/generators of code and to easily target languages with rich type systems.

I just saw the link, so I hoped to clear any confusion on Airtight/Pseudo's approach.

@ethanhs
Copy link
Collaborator

ethanhs commented Jan 5, 2017

@alehander42 thank you for clarifying. Pseudo Python looks very interesting indeed! Yes, the PyIR suggestion seems more to be about WebAssembly/LLVM type bytecode to produce faster Python execution.
I actually came across https://github.com/sklam/pyir_interpreter, which does seem to implement the idea of a Python IR interpreter.

@ilevkivskyi
Copy link
Member

Mypyc is out there for a while and is going well, so I think this may be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests