Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One case of max iteration count being exceeded. #12458

Open
mrolle45 opened this issue Mar 26, 2022 · 6 comments
Open

One case of max iteration count being exceeded. #12458

mrolle45 opened this issue Mar 26, 2022 · 6 comments
Labels
crash semantic-analyzer Problems that happen during semantic analysis topic-named-tuple

Comments

@mrolle45
Copy link

In my project I am encountering the max iteration count problem.
It is working with a list of about 20 modules. After two iterations of analysis, all the modules get deferred, and four of them report making some progress.
The problem is that these modules are making the same progress every time. It is adding an entry to the module's symbol table. However in the next iteration, the symbol is missing from the symbol table and gets added again, resulting in a report of progress being made.
The semanal code is too complicated for me follow. It seems that somehow, the updated symbol table for the module is getting lost and the module state remains at its state before the iteration pass.

I hope this can give you a clue to what to look for in the semanal code that would account for the symbol tables not getting carried over to the next iteration.

I did manage to do some debugging and track down these progress events. One example is in a module:

class InstrValCol(Col[ColKey, ValType], ColAttrsBase, metaclass=CacheMeta):
	...
	class LookupResultTup(NamedTuple):
		status : RowKeyStatus
		value : Optional[ValType] = None

		def __repr__(self):
			repr = self.status.name
			if self.value is not None: repr += f' {self.value}'
			return repr

It is the name __repr__ which is being added to the symbol table every time.

@mrolle45 mrolle45 added the crash label Mar 26, 2022
@AlexWaygood AlexWaygood added topic-named-tuple semantic-analyzer Problems that happen during semantic analysis labels Mar 26, 2022
@AlexWaygood
Copy link
Member

Thanks for the report! If you could provide the full traceback (you'll get it by running mypy with the --show-traceback option), and/or provide a minimal, reproducible example, that would be very helpful.

N.B. This looks similar to #9397 and #8695

@mrolle45
Copy link
Author

mrolle45 commented Mar 26, 2022

Another module with this behavior, but a different sort of item with multiple progress:

""" Classes for columns used directly by an InstrTab.
Does not include ModsDiffCol (modifiers column).
"""

from __future__ import annotations

from .common import *
from .instr_common import *
from . import instr_common
from .modifiers import *
from .sub_codes import *

_ValType = TypeVar('_ValType', 'InstrVal', 'Body', 'Mods', 'FldCatVal')

The TypeVar gets added to the symbol table on each iteration starting with the second.
The four classes in that statement are indeed already imported (I checked).
I tried unquoting the type names:

_ValType = TypeVar('_ValType', InstrVal, Body, Mods, FldCatVal)

Lo and behold! The symbol never got added at all, because it was forever determined to be incomplete.
Therefore, the bad behavior is connected to using literal strings in the TypeVar parameters.

Note: the topic-named-tuple label does not apply to this case, but only to the original case I cited.

@mrolle45
Copy link
Author

mrolle45 commented Mar 26, 2022

@AlexWaygood I tried making a project with just the one module, but then mypy behaved correctly. Further investigating, I found that _ValType was still being added to the symbol table, but in the second iteration, there were no defers. This terminated the iterations. In the original project, there were the same defers each iteration. I don't understand why this happened.
Since the bug only shows up in the third and subsequent iterations, this project is not useful for debuging.
So in order for you to have something to debug with, I'd have to send you a zip of my entire project.
Do you want the entire project? I know you wanted a minimal, reproducible example.

@mrolle45
Copy link
Author

Found one bug
The class PlaceholderType needs a definition for __eq__.
Without it, the method process_typevar_declaration in semanal.py thinks that the type variable _ValType has changed because it is comparing the objects for the values of the type variable. In the third iteration, there are placeholder objects, and these compare unequal even though they have the same fullname and args.
I added a definition for PlaceholderType.__eq__ to my copy of the code.
Now there is no longer any progress being reported 👍 .
Found another bug
The progress reports in the first case above are related to namedtuple or NamedTuple based classes.
The relevant code is in semanal_namedtuple.py, method NamedTupleAnalyzer.save_namedtuple_body().
Its job is to process the class definition syntax tree and update the class TypeInfo for definitions that are now complete, and taking into account existing definitions that come from the namedtuple base class.
The method does this work using a new, empty, SymbolTable. This will get populated with definitions that are now complete, which is fine. However, since these were not in the new table, the analyzer sets the progress flag, even for definitions which were already complete. The new symbol table gets merged back into the original table for the class, with special logic for certain namedtuple members.
In my opinion, the correct method should be to save the existing definitions, then process the class definition tree the same as any other class, then make any adjustments to the updated symbol table. In other words, do not make this temporary empty symbol table.
I am going to give it some more thought and write a new save_namedtuple_body method, and be sure it has the same result but without the progress indicators for definitions that were already in the class and not as placeholders. I'll add another comment when I've done this. Hopefully, all of the repeated progress flags will be gone for my project. Then I'll see if all the deferred items get resolved.

@mrolle45 mrolle45 reopened this Mar 27, 2022
@mrolle45
Copy link
Author

mrolle45 commented Mar 27, 2022

By the way, I didn't mean to close this issue earlier. I hit the wrong button. Sorry about that!

@mrolle45
Copy link
Author

mrolle45 commented Mar 27, 2022

Fixes for bugs
For recognizing PlaceholderType instances as equal:

types.py:
class PlaceholderType(ProperType):
   def __eq__(self, other: object) -> bool:
        if type(self) != type(other): return False
        return self.fullname == other.fullname and self.args == other.args

This prevents repetitive progress indications for the same TypeVar definitions.

For recognizing user-defined methods of a namedtuple class as having already been seen:

semanal_namedtuple.py:
    @contextmanager
    def save_namedtuple_body(self, named_tuple_info: TypeInfo) -> Iterator[None]:
        """Preserve the generated body of class-based named tuple and then restore it.

        Temporarily clear the names dict so we don't get errors about duplicate names
        that were already set in build_namedtuple_typeinfo (we already added the tuple
        field names while generating the TypeInfo, and actual duplicates are
        already reported).
        """
        # I changed these names to make the code more clear.
        old_names = named_tuple_info.names
        new_names = named_tuple_info.names = SymbolTable()

        # I added these lines
        # Copy existing user-defined methods to new symbol table so that analyzing the
        # class def doesn't indicate progress.  Variable names aren't a problem because
        # they won't get added to the symbol table if they were bound in a previous iteration.
        for key, value in old_names.items():
            if isinstance(value.node, (FuncBase, Decorator)) and not value.plugin_generated:
                # Keep user-defined methods as is.
                new_names[key] = value

        yield

        # Make sure we didn't use illegal names, then reset the names in the typeinfo.
        for prohibited in NAMEDTUPLE_PROHIBITED_NAMES:
            if prohibited in new_names:
                if old_names.get(prohibited) is new_names[prohibited]:
                    continue
                ctx = new_names[prohibited].node
                assert ctx is not None
                self.fail('Cannot overwrite NamedTuple attribute "{}"'.format(prohibited),
                          ctx)

        # Restore the names in the original symbol table. This ensures that the symbol
        # table contains the field objects created by build_namedtuple_typeinfo. Exclude
        # __doc__, which can legally be overwritten by the class.
        for key, value in old_names.items():
            if key in new_names:
                if key == '__doc__':
                    continue
                sym = new_names[key]
                #I removed these lines as they were made unnecessary above.
                #if isinstance(sym.node, (FuncBase, Decorator)) and not sym.plugin_generated:
                #    # Keep user-defined methods as is.
                #    continue
                # Keep existing (user-provided) definitions under mangled names, so they
                # get semantically analyzed.
                r_key = get_unique_redefinition_name(key, new_names)
                new_names[r_key] = sym
            new_names[key] = value

I tested this code on my project. I tracked all the places where the progress flag is set, and of all the situations where there were infinite progress flags before, now each of these occurs exactly once.

Something odd that I did notice in this test. During the final iteration (number 5 in this case), some more progress flags were set. They were ignored since this was the final iteration. However, it indicates that some progress was made in iteration 4. It pertained to from Y import A in module X.py. In iteration 4, the name Y.A was still unresolved, but in iteration 5 it was resolved, and so the name X.A got added. It happens that Y.py is analyzed after X.py. When I figure out what the problem is, I expect to create a new issue for it.

Perhaps we all should have a discussion about what "making progress" means in these iterative scenarios, and how it should be implemented. Broadly speaking, I interpret it to mean that if another pass through the remaining incomplete modules in the SCC will have different results, then the current pass has made progress.
Perhaps a straightforward way of doing this would be to tag the most recent change to any AST Node and propagate this up the chain of containing nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash semantic-analyzer Problems that happen during semantic analysis topic-named-tuple
Projects
None yet
Development

No branches or pull requests

2 participants