-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mypy "strict mode" for static compilation #1862
Comments
This discussion adresses type vs subclass question python/typing#241 Also does the future label mean this is planned for implementation at some point? |
I am not sure that this is what you want to discuss here, but at least probably something related. I have seen a discussion in Cython mailing list some time ago, where the conclusion is that PEP 484 is quite useless for Cython because typing.py does not support close-to-the-machine types like I don't agree with this, PEP 484 is about some commonly agreed syntax for type information in Python, not about implementation of these types. Currently, Cython uses its own syntax for declaring types, so that Cython code is not a valid Python code. I want to make a translator script, that will take a type annotated Python 3 code and make it Cython code. Of course, it should be accompanied with a stub module (ctyping.pyi?) that contains low-level types like With such a tool one can work with a native Python code, and then try some speed-up by running it in Cython using annotations that are present in the code. These annotations could be checked by mypy, since this is just a native Python. I think this should be quite simple, since it looks like it is not necessary to go through an AST to perform the translation, it looks like it could be done entirely on the level of lexer/tokenizer. Currently this is on the stage of just an idea, since I don't have time to work on this now, but at some point I will definitely try this. |
Cython is missing features like generic classes. How would you deal with that? |
Although I have not used it yet, there seems to be some support for C++ templates in Cython: http://docs.cython.org/en/latest/src/userguide/wrapping_CPlusPlus.html#templates . There are also fused types (https://github.com/cython/cython/wiki/enhancements-fusedtypes) that are very similar to constrained type variables like I am going to ignore all the types that could not be expressed in Cython, at least at early stages. In principle, it is possible to specialize some generics before translation to Cython, but as I see it, it could be quite complex. |
@datnamer I forgot about this while I was on vacation. I'm still interested in this topic and have some ideas, though I haven't had time to write anything substantial down. |
OK, let's have it. |
Cool :). I think pep 526 can make this much nicer also. |
@JukkaL - It's probably a bit (alot) early for this but sometimes better to be forward thinking: but do you think we would be able to statically resolve things like protocols and a potential future multiple dispatch (together or separately), or would this be restricted more like cython? Also, would this need fixed width integers to be added? |
Hello, I'd like to stay with Python instead of using C(++) or going with some JVM language (Kotlin, Ceylon,...) and wonder of this issue is supposes to bring the feature that after performing mypy analysis on the code, one could take advantage of it and get one's code cython-ized automatically? |
@kirbyfan64 I know about it, even asked the question about Nuitka & Mypy on ml (got no reply), but, afaict, Nuitka won't take advantage of type annotations, but is going to do its own independent analysis, right? |
@gour that is my understanding, unless something has changed. |
As far as I know, Nuitka isn't going to use type annotations. Type analysis without annotations is very difficult for larger programs. I haven't tried Nuitka recently or followed closely what's going on there, but I believe that their approach is kind of hard to pull through, except maybe for smaller programs or smaller performance gains than what I'd hope to see. Compiling programs with PEP 484 annotations to cython is also not easy to do effectively, since the type systems are quite different. This doesn't mean that PEP 484 annotations can't be used to speed up programs, only that the approach would likely have to somewhat different from what cython does right now. Compiling programs with PEP 484 annotations to CPython C extension modules seems feasible, and I've done some very preliminary work on it. The compiled programs likely wouldn't have full compatibility with Python semantics. For example, if something is annotated as a list, maybe the compiler would insert a runtime check to ensure that you can't assign a non-list object to the variable. Also, if you call a function, maybe the compiler would (under some circumstances) assume that there is no monkey patching and directly bind to the target function instead of going through a namespace lookup. Cython already can do a bunch of similar things to get good performance, so it wouldn't be anything terribly new. |
Is that recent work, or was it the work you blogged about surrounding the initial stages of Mypy? |
This is recent and mostly separate from the earlier work, though there are obvious similarities. |
Cool. Would it use the full flexibility of the type system, like generics and protocols etc? |
It's too early say. Likely there would have to be some limitations, but it's unclear what exactly. |
Have you seen Julia's work on AOT compilation? It can retain pretty much the full dynamicity and expressiveness of the language, including generics, abstractly and untyped functions etc while emitting code within the magnitude of C, or matching it. http://juliacomputing.com/blog/2016/02/09/static-julia.html The only catch includes very sane things like can't monkey patch attributes etc...however methods can be added using multiple dispatch. Is this feasible with mypy? It may require things like multiple dispatch for function specialization selection at call time and LLVM stuff. For something more python-ey, Numba already has this sort of multiple dispatch. However it is under the hood and doesn't have the same sort of generic and expressive type system features of Mypy or Julia. Perhaps there could be synchronicity between the projects..I think @pzwang can say more about whether any ideas are transferable. |
I think that any developments in this area are potentially very useful and
will be following them with interest.
For what it is worth, for my use-case, I'm quite able to take advantage of
code generation even with an extremely restricted subset of the language...
…-Will
On 5 December 2016 at 14:57, datnamer ***@***.***> wrote:
@JukkaL <https://github.com/JukkaL>
Have you seen Julia's work on AOT compilation? It can retain pretty much
the fuly dynamicity and expressiveness of the language, including generics,
abstractly and untyped functions etc while emitting code within the
magnitude of C, or matching it.
http://juliacomputing.com/blog/2016/02/09/static-julia.html
The only catch includes very sane things like can't monkey patch
attributes etc...however methods can be added using multiple dispatch.
Is this feasible with mypy? It may require things like multiple dispatch
for function specialization selection at call time and LLVM stuff.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1862 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPdg0n3QZdJCrLcaQWvggPTr4NphBQ9ks5rFCZUgaJpZM4JLr8Q>
.
|
@datnamer Here are examples of things that might be limited (or the use of which may limit performance gains):
|
That all makes sense. What do you think about generic type safe classes with type vars? |
Generic type parameters (including for things like
The compiled form could behave like this:
|
Thanks this for the example. This would have a runtime cost and the function can't be inlined, right? Or can the branch somehow be eliminated. |
Which function are you thinking about (regarding inlining)? There would be a runtime cost, but we could use low-level C API calls for the
However, this likely would likely be a potential post-1.0 feature instead of a core part of the project. |
Gotcha thanks. Sorry for all the questions above and below, this all very helpful as I plan out an application. This cost/check would only be present for functions called at runtime from the interpreter , not compiletime, right? How about custom data structures from classes- Would we need fixed with ints for attributes? How would this work with structural subtyping, if at all? So do you mean a typvar for a field T be erased and instantiated as an int64 for example? For inlining, I mean any function I want to use in a loop... my dream is writing my own person class for a simulation which has a immutable stack allocated generic yet typesafe random variable as an attribute that is sampled from in a simulation loop. Or maybe that wouldn't be a good usecase. |
@seanjensengrey I've looked at Shed Skin before. The approach I'm proposing can be more flexible and should support more Python features and accessing basically arbitrary Python libraries, since we could support dynamically typed values through @rowillia My understanding is that HPHPc didn't use type annotations to speed up code, but there clearly are other similarities. Also, I have the impression that HPHPc was basically a full reimplementation of PHP, whereas my proposal would still use the normal CPython runtime and libs. I've briefly looked at FAT Python before. It looks to me that it is doing most/all work at runtime, making it closer to a JIT compiler than what I'm proposing here. |
but FAT python looks to make some guarantees with python code using function guards and now the merged dictionary versioning. I think these would make Mypy's job easier, no? |
There is a group of researchers in Tokyo, who work on two-way transpiler from subset of Fortran to type hints with Python 3.5+. They use tools such as this one: https://github.com/mbdevpl/typed-astunparse They published a paper at Python HPC: http://conferences.computer.org/pyhpc/2016/papers/5220a009.pdf |
Wow -- I didn't know about this one. Thanks for the heads-up!
…On 19 December 2016 at 05:06, denfromufa ***@***.***> wrote:
There is a group of researchers in Tokyo, who work on two-way transpiler
from subset of Fortran to type hints with Python 3.5+. They use tools such
as this one:
https://github.com/mbdevpl/typed-astunparse
They published a paper at Python HPC:
http://conferences.computer.org/pyhpc/2016/papers/5220a009.pdf
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1862 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPdg8UlhyrfXgNx_TI9Lw2kwFQWm7P5ks5rJhDagaJpZM4JLr8Q>
.
|
@JukkaL https://opensource.googleblog.com/2017/01/grumpy-go-running-python.html Have you seen that? Looks quite relevant. |
@datnamer Thanks for the link! It looks interesting, especially for organizations that are also heavily investing in Go and don't have a large legacy codebase that would make porting hard. Their approach seems to have a few major practical implications:
|
Makes sense. But can't there be some kind of Nogil annotation like in cython? Or does cython have some manual memory management that allows such a thing? |
With Cython, any operation that involves Python objects and functions must hold the GIL. The "nogil" function annotation and context managers are for code that is purely C. Otherwise Cython will refuse to compile your code. Memory management in "nogil" sections is whatever C/C++ provides at that point. |
It might be possible to support some things without the GIL, but for it to be safe, I think that you'd only be able to use certain low-level types that don't require taking the GIL such as numpy arrays and fixed-width integers, and you wouldn't be able to call most functions. Not sure how useful this would be. (I haven't tried the Cython nogil feature but I've seen it mentioned in the docs.) |
From the author of the project, when I suggested a 'dropbox google collaboration ': "Yes, leveraging type hints for optimization purposes is a long term goal. Thanks for pointing me to [this] issue, I'll keep an eye on it. One of the goals of open sourcing was to get feedback and work with outside folks so I'm definitely open to collaboration!" |
A somewhat related project to static compilation is Hermetic. The program takes type annotated Python functions and through Hindley-Milner type deduction generates C code. Sadly H-M type deduction doesn't work well with Python's OOP style, but the project is very impressive work all the same. |
Hey, I am the author of Airtight(the HM thing). Airtight isn't really implementing Python, it is like an experiment in combining Python's syntax and philosophy with functional programming and stronger types systems. Actually I have another library: pseudo-python that compiles a static subset of Python to readable/idiomatic code in Go/C#/Ruby/JS(C++/Rust in the making) which is more relevant to the discussion. I planned to use mypy type hints when they stabilize (currently it just does a form of full type inference, which is kinda possible because pseudo is used only for self-contained python code without dependencies). However Pseudo is also implementing a limited part of Python, so it's not a great example for PyIR. Good and standartized type annotations syntax/semantics are still a very nice part of Python because they make it suitable for writing all kinds of specialized transpilers/generators of code and to easily target languages with rich type systems. I just saw the link, so I hoped to clear any confusion on Airtight/Pseudo's approach. |
@alehander42 thank you for clarifying. Pseudo Python looks very interesting indeed! Yes, the PyIR suggestion seems more to be about WebAssembly/LLVM type bytecode to produce faster Python execution. |
Mypyc is out there for a while and is going well, so I think this may be closed. |
Problem statement:
Proposed Solution: a "PyIR" that can be consumed by various compilers. Details here: https://docs.google.com/document/d/1jGksgI96LdYQODa9Fca7EttFEGQfNODphVmbCX0DD1k/edit
Question for Mypy: A cython subset seems to be the recommended source format. Can a "strict mode" mypy be used instead to output this IR? Advantages include more expressive than cython (generics etc), bootstrap off mypy work and less fragmentation.
Excerpts from discussion on gitter:
@njsmith
@kmod
bonus: Where would dynd's datashape play in? If so, can Dynd's datashape be used as a mypy plugin to annotate array types? @insertinterestingnamehere
The text was updated successfully, but these errors were encountered: