Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type hints to strconv.py, and propagate implications #2162

Merged
merged 1 commit into from
Sep 21, 2016

Conversation

elazarg
Copy link
Contributor

@elazarg elazarg commented Sep 20, 2016

I am not sure that's the best string representation for NewType.

I strongly feel that composing TypeInfo in subclasses of nodes.Node make them non-cohesive. This mapping should be external.

@elazarg
Copy link
Contributor Author

elazarg commented Sep 20, 2016

This PR was also "tested" by printing self.tree to see that no exceptions are raised.

@elazarg elazarg mentioned this pull request Sep 20, 2016
@gvanrossum
Copy link
Member

I strongly feel that combining Nodes and TypeInfo makes mypy.nodes
classes non-cohesive.

Can you clarify? Is this about having them in the same file? If that's so I
don't disagree, though it may be easier said than done to move it into its
own file, due to cyclical dependencies -- mypy handles cycles within the
same file much better than import cycles.

@gvanrossum gvanrossum merged commit f9c225e into python:master Sep 21, 2016
@gvanrossum
Copy link
Member

Does this relate to #2109?

@elazarg
Copy link
Contributor Author

elazarg commented Sep 21, 2016

I will address #2109 there, and I agree with the ideal of not putting these classes in the same file, but chose very bad wording; sorry. What I meant is regarding the fact that e.g. NewType has a TypeInfo inside it.

(What follows is probably too long and unuseful as usual. Please don't feel obligated to read it)


I will try to convey my understanding of core concepts in the analysis; It is not a suggestion as to what should be done (which as you and Jukka explained to me involves complex trade-offs). I use "should" and similar words but please ignore them. I also mix several levels of design and theory since things are not really clear to me. And I am not really sure it's worth your time.

I believe that Node subclasses should represent the "fixed" ast, with the added constructs such as NewType, NamedTuple, cast and friends, perhaps collapsing intrinsic decorators, distinguishing (certain) type expressions from regular expressions, and similar syntactic fixes. Nothing more - in particular no type information. Once built, this ast can be immutable. It's not exactly semantic analysis that makes it, but rather the incorporation of certain standard libraries into the language.

So the mapping from an ast node to a type feels external to the ast itself - a node is not "built out of" types, and the type of a node depends on lots of context. This is a "shared" pointer, possibly there for convenience or performance. I think that this kind of "incoherence" is part of the reason that Optional is used in so many places. Far too many in my opinion.

As you pointed out (if I understood you correctly), SymbolNode should not be involved with all that jazz, and so does TypeInfo (I know it's not easy; I did not succeed making this transformation).

The ast comprises the "underlying graph" of the data flow, even if no explicit Graph data structure is used. Semantic analysis builds the type information in "sources" of the data flow graph - the explicitly annotated or trivially inferred parts of the program. It may also prune some unreachable edges in the graph based on this immediately-accessible type information. The result is kept in what is now called SymbolTable - Describing and binding sources, sinks and edges (lvalues, call-signatures matches, etc.). The type information is still incomplete so to be sound there are many "edges" left in the "graph", e.g. from a call to every possible matching signature. Again, most of this is implicit which is probably fine, but it's there.

There is yet another distinction which is too fuzzy. A "Type" is just a type. It is not a Context. It should not be mutable. Context is very important for reporting errors, but this is not a "Type". So the solver should work with "TypeOccurrence" which holds both the context and the type - what is now somewhat confusingly called "Type". Separating Context and Type will make it more obvious what a Type mean (again, I don't actually suggest doing that).

Assuming the code reflects what I just describe, it should not involve too many dependency cycles. A Type is built from other types. A Node is built from other nodes. TypeOccurrence combines them. SymbolTable binds TypeOccurrences to expressions and TypeInfo to type definitions, etc.

@gvanrossum
Copy link
Member

You probably just need to keep reading and re-reading the code. You didn't
even mention Instance, which is how types and TypeInfo are bound together.

One thing though: IIUC there is indeed a confusion about the meaning of
Context. Sometimes it's just something used for error messages; and
sometimes this is the type of the receiving argument or variable, which is
used to disambiguate the type of the expression (e.g. when there's an empty
container or a lambda). It's confusing that both of these are called
context.

The issue with Optional is complicated, but it's not easy to get to a
better world.

On Tue, Sep 20, 2016 at 8:03 PM, Elazar notifications@github.com wrote:

I will address #2109 #2109 there,
and I agree with the ideal of not putting these classes in the same file,
but chose very bad wording; sorry. What I meant is regarding the fact that
e.g. NewType has a TypeInfo inside it.

(What follows is probably too long and unuseful as usual. Please don't

feel obligated to read it)

I will try to convey my understanding of core concepts in the analysis; It
is not a suggestion as to what should be done (which as you and Jukka
explained to me involves complex trade-offs). I use "should" and similar
words but please ignore them. I also mix several levels of design and
theory since things are not really clear to me. And I am not really
sure it's worth your time.

I believe that Node subclasses should represent the "fixed" ast, with the
added constructs such as NewType, NamedTuple, cast and friends, perhaps
collapsing intrinsic decorators, distinguishing (certain) type expressions
from regular expressions, and similar syntactic fixes. Nothing more - in
particular no type information. Once built, this ast can be immutable. It's
not exactly semantic analysis that makes it, but rather the incorporation
of certain standard libraries into the language.

So the mapping from an ast node to a type feels external to the ast
itself - a node is not "built out of" types, and the type of a node depends
on lots of context. This is a "shared" pointer, possibly there for
convenience or performance. I think that this kind of "incoherence" is part
of the reason that Optional is used in so many places. Far too many in my
opinion.

As you pointed out (if I understood you correctly), SymbolNode should not
be involved with all that jazz, and so does TypeInfo (I know it's not easy;
I did not succeed making this transformation).

The ast comprises the "underlying graph" of the data flow, even if no
explicit Graph data structure is used. Semantic analysis builds the type
information in "sources" of the data flow graph - the explicitly annotated
or trivially inferred parts of the program. It may also prune some
unreachable edges in the graph based on this immediately-accessible type
information. The result is kept in what is now called SymbolTable -
Describing and binding sources, sinks and edges (lvalues, call-signatures
matches, etc.). The type information is still incomplete so to be sound
there are many "edges" left in the "graph", e.g. from a call to every
possible matching signature. Again, most of this is implicit which is
probably fine, but it's there.

There is yet another distinction which is too fuzzy. A "Type" is just a
type. It is not a Context. It should not be mutable. Context is very
important for reporting errors, but this is not a "Type". So the solver
should work with "TypeOccurrence" which holds both the context and the type

  • what is now somewhat confusingly called "Type". Separating Context and
    Type will make it more obvious what a Type mean (again, I don't actually
    suggest doing that).

Assuming the code reflects what I just describe, it should not involve too
many dependency cycles. A Type is built from other types. A Node is built
from other nodes. TypeOccurrence combines them. SymbolTable binds
TypeOccurrences to expressions and TypeInfo to type definitions, etc.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#2162 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACwrMt4X7JsIlUPmCrp7smjcvNIGyroiks5qsJ5mgaJpZM4KCNLx
.

--Guido van Rossum (python.org/~guido)

@elazarg
Copy link
Contributor Author

elazarg commented Sep 21, 2016

You probably just need to keep reading and re-reading the code.

I will.

Regarding Context, if the visitor pattern had the side effect of updating the current context of the pass, Contexts need not be passed around explicitly so often - almost exclusively copied and not used. I am not sure that's a better design, since it adds implicitly-mutable state, but it has the benefit of being able to change what Context mean without a cascading effect all over the place. (Incidentally, Context then will be used with a context manager :) )

@elazarg elazarg deleted the strconv_typehints branch October 2, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants