Description
I had a look again at pytest performance, and was struck again by how many stat
calls pytest performs. This seems to boil down to using the path classes from py for checking file existence, walking directory structures and such. Unfortunately instances of these are passed to plugins so aren't internal implementation details.
I suggest we think of using plain strings as paths that are passed to plugins. This is obviously a breaking change so can't be done until pytest 6. It's not clear to me how we'd make deprecation warnings for this though :(
To take a concrete example of this being a problem, the test suite for the product I work on calls stat 79k times just for the collect phase. If I monkey patch stat to log the paths there are ~3k unique paths in the output. I can get a little performance boost by monkey patching stat to be cached:
orig_stat = os.stat
cache = {}
def monkey_stat(*args, **kwargs):
a = args[0]
if a in cache:
return cache[a]
r = orig_stat(*args, **kwargs)
cache[a] = r
return r
That this can improve the performance is pretty silly :P
It seems like pytest could just use os.walk
once and then use that data for the rest of the run of the program...