Description
openedon May 8, 2020
Problem
The recursive nature of the _get_assign_nodes()
method can lead to a RecursionError
and break running pylint
on a project that has a class with a large control structure in its method.
Workaround
A hackish means of avoiding this bug is to increase the stack size to some value that enables further recursion without exhausting the stack. It's documented in a comment, but in essence the idea is calling sys.setrecursionlimit(some_int)
before running pylint
or astroid
). A more robust fix would be to avoid recursion altogether since this hack only increases the number of lines astroid can parse.
Affected versions
>>> from astroid import __pkginfo__; print(__pkginfo__.version)
2.4.1
I've tested this on every minor version since Astroid 2.1.0. Astroid 2.0.4 is not affected.
Since the release of Astroid 2.1.0, we've been experiencing stack overflow exceptions for a class with a very large chain of if/else
statements.
How to reproduce:
The following Python file is enough to trigger a recursion error on any version of Astroid from 2.1.0 through the unreleased 2.5.0:
# file_with_large_if_elif.py
class TestClasse:
def cause_astroid_recursion(self):
if True:
pass
elif True:
pass
# Repeat as needed to overflow > sys.getrecursionlimit() on your system
#
# On my machine, 119 total `elif` statements is enough to trigger recursion
elif True:
pass
$ pylint file_with_large_if_elif.py
<full stacktrace truncated>
File ".../astroid/mixins.py", line 153, in _get_assign_nodes
return list(itertools.chain.from_iterable(children_assign_nodes))
File ".../astroid/mixins.py", line 151, in <genexpr>
for child_node in block
File ".../astroid/decorators.py", line 33, in cached
cache[func] = result = func(*args, **kwargs)
File ".../astroid/mixins.py", line 153, in _get_assign_nodes
return list(itertools.chain.from_iterable(children_assign_nodes))
File ".../astroid/mixins.py", line 151, in <genexpr>
for child_node in block
File ".../astroid/decorators.py", line 33, in cached
cache[func] = result = func(*args, **kwargs)
RecursionError: maximum recursion depth exceeded while calling a Python object
Source of the RecursionError
The current implementation of MultiLineBlockMixin
makes a recursive call to _get_assign_nodes()
:
@decorators.cached
def _get_assign_nodes(self):
children_assign_nodes = (
child_node._get_assign_nodes()
for block in self._multi_line_blocks
for child_node in block
)
return list(itertools.chain.from_iterable(children_assign_nodes))
Possible fix
I think that a possible fix would be to switch from a recursive design, towards a stack-based solution. If we instead built a list of nodes as we go (say, with a while
loop & a stack), we could avoid the overflow.
The generators used in _get_assign_nodes()
are eagerly converted into a list anyway, so keeping a list of children then popping to get all assign nodes should have no negative performance effects.
A quick attempt on my own machine seems to work (though I've not written any new tests, or tested my changes against the test suite).
A less contrived reproduction
I stumbled upon this error by trying to run astroid
on a class that's automatically produced by Apache Thrift.
Given an input file like so:
// proof_of_concept.thrift
struct LargeStruct {
1: optional string argument_1;
2: optional string argument_2;
3: optional string argument_3;
// ...
116: optional string argument_116;
}
One can run thrift --gen py proof_of_concept.thrift
to produce gen-py/proof_of_concept/ttypes.py
In the produced file is a large Python class - the many if/elif
statements in read()
method is what trips up astroid
:
class LargeStruct(object):
# <other methods truncated>
def read(self, iprot):
# (some code truncated) - the body of the `while` loop is what's relevant
while True:
(fname, ftype, fid) = iprot.readFieldBegin()
if ftype == TType.STOP:
break
if fid == 1:
if ftype == TType.STRING:
self.argument_1 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
else:
iprot.skip(ftype)
elif fid == 2:
if ftype == TType.STRING:
self.argument_2 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
else:
iprot.skip(ftype)
#
# <113 other `elif` branches truncated>
#
elif fid == 116:
if ftype == TType.STRING:
self.argument_116 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
else:
iprot.skip(ftype)
else:
iprot.skip(ftype)
iprot.readFieldEnd()
iprot.readStructEnd()
Large Thrift structs are fairly common, so this can happen with some frequency.
Pull request?
I understand if parsing hundreds of control statements isn't something that astroid
seeks to be able to do.
However, if you're open to refactoring out recursion, I'd be willing to write up a PR for this problem.