Skip to content

MultiLineBlockMixin causes stack overflow from recursive _get_assign_nodes() #786

Open

Description

Problem

The recursive nature of the _get_assign_nodes() method can lead to a RecursionError and break running pylint on a project that has a class with a large control structure in its method.

Workaround

A hackish means of avoiding this bug is to increase the stack size to some value that enables further recursion without exhausting the stack. It's documented in a comment, but in essence the idea is calling sys.setrecursionlimit(some_int) before running pylint or astroid). A more robust fix would be to avoid recursion altogether since this hack only increases the number of lines astroid can parse.

Affected versions

>>> from astroid import __pkginfo__; print(__pkginfo__.version)
2.4.1

I've tested this on every minor version since Astroid 2.1.0. Astroid 2.0.4 is not affected.

Since the release of Astroid 2.1.0, we've been experiencing stack overflow exceptions for a class with a very large chain of if/else statements.

How to reproduce:

The following Python file is enough to trigger a recursion error on any version of Astroid from 2.1.0 through the unreleased 2.5.0:

# file_with_large_if_elif.py
class TestClasse:
    def cause_astroid_recursion(self):
        if True:
            pass
        elif True:
            pass
        # Repeat as needed to overflow > sys.getrecursionlimit() on your system
        #
        # On my machine, 119 total `elif` statements is enough to trigger recursion
        elif True:
            pass
$ pylint file_with_large_if_elif.py
<full stacktrace truncated>
  File ".../astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File ".../astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File ".../astroid/decorators.py", line 33, in cached
    cache[func] = result = func(*args, **kwargs)
  File ".../astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File ".../astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File ".../astroid/decorators.py", line 33, in cached
    cache[func] = result = func(*args, **kwargs)
RecursionError: maximum recursion depth exceeded while calling a Python object

Source of the RecursionError

The current implementation of MultiLineBlockMixin makes a recursive call to _get_assign_nodes():

@decorators.cached
def _get_assign_nodes(self):
    children_assign_nodes = (
        child_node._get_assign_nodes()
        for block in self._multi_line_blocks
        for child_node in block
    )
    return list(itertools.chain.from_iterable(children_assign_nodes))

Possible fix

I think that a possible fix would be to switch from a recursive design, towards a stack-based solution. If we instead built a list of nodes as we go (say, with a while loop & a stack), we could avoid the overflow.

The generators used in _get_assign_nodes() are eagerly converted into a list anyway, so keeping a list of children then popping to get all assign nodes should have no negative performance effects.

A quick attempt on my own machine seems to work (though I've not written any new tests, or tested my changes against the test suite).

A less contrived reproduction

I stumbled upon this error by trying to run astroid on a class that's automatically produced by Apache Thrift.

Given an input file like so:

// proof_of_concept.thrift
struct LargeStruct {
    1: optional string argument_1;
    2: optional string argument_2;
    3: optional string argument_3;
    // ...
    116: optional string argument_116;
}

One can run thrift --gen py proof_of_concept.thrift to produce gen-py/proof_of_concept/ttypes.py

In the produced file is a large Python class - the many if/elif statements in read() method is what trips up astroid:

class LargeStruct(object):
    # <other methods truncated>
    def read(self, iprot):
        # (some code truncated) - the body of the `while` loop is what's relevant
        while True:
            (fname, ftype, fid) = iprot.readFieldBegin()
            if ftype == TType.STOP:
                break
            if fid == 1:
                if ftype == TType.STRING:
                    self.argument_1 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            elif fid == 2:
                if ftype == TType.STRING:
                    self.argument_2 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            #
            # <113 other `elif` branches truncated>
            #
            elif fid == 116:
                if ftype == TType.STRING:
                    self.argument_116 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            else:
                iprot.skip(ftype)
            iprot.readFieldEnd()
        iprot.readStructEnd()

Large Thrift structs are fairly common, so this can happen with some frequency.

Pull request?

I understand if parsing hundreds of control statements isn't something that astroid seeks to be able to do.

However, if you're open to refactoring out recursion, I'd be willing to write up a PR for this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions