Tests collection is slow on high number of tests

Hi,

We have over 15k tests in the repository, and recently switched from nosetests to pytest. Apparently, tests collection in pytest is way slower than in nosetests - 130s vs 30s. Also, I need to mention that `norecursedirs` and `testpaths` parameters are both set, just to exclude the case of walking in .git or other unrelated directories.
It's also interesting, that the number of collected tests is asymptotic to time:
![image](https://user-images.githubusercontent.com/2122618/60345588-5eb3ff00-9987-11e9-8ef1-59da7c43c9a3.png)

I tried to profile it using pyflame and cProfile, it seems like the slowest part is calling node.py:FixtureManager._matchfactories method call:
![image](https://user-images.githubusercontent.com/2122618/60345970-2cef6800-9988-11e9-9b75-eb99ed216a23.png)

Here is a callgraph:
![image](https://user-images.githubusercontent.com/2122618/60346065-61632400-9988-11e9-86ce-b09eb99c949c.png)

When I tried to log, what's is happening, it seems like, the more tests, the more method is called with polynomial dependency.

Going deeper into `_matchfactories` method, I was able to reduce collection time from 130s to 85s just by caching the results of `_splitnode` method:
```diff
diff --git a/src/_pytest/nodes.py b/src/_pytest/nodes.py
index 491cf9d2c..62eb66086 100644
--- a/src/_pytest/nodes.py
+++ b/src/_pytest/nodes.py
@@ -13,6 +13,9 @@ SEP = "/"
 tracebackcutdir = py.path.local(_pytest.__file__).dirpath()
 
 
+_node_cache = {}
+
+
 def _splitnode(nodeid):
     """Split a nodeid into constituent 'parts'.
 
@@ -31,9 +34,13 @@ def _splitnode(nodeid):
     if nodeid == "":
         # If there is no root node at all, return an empty list so the caller's logic can remain sane
         return []
-    parts = nodeid.split(SEP)
-    # Replace single last element 'test_foo.py::Bar' with multiple elements 'test_foo.py', 'Bar'
-    parts[-1:] = parts[-1].split("::")
+
+    parts = _node_cache.get(nodeid)
+    if not parts:
+        parts = nodeid.split(SEP)
+        # Replace single last element 'test_foo.py::Bar' with multiple elements 'test_foo.py', 'Bar'
+        parts[-1:] = parts[-1].split("::")
+        _node_cache[nodeid] = parts
     return parts
```

It doesn't solve the asymptotic problem, but just making `_splitnode` and thus `_matchfactories` method faster:
![image](https://user-images.githubusercontent.com/2122618/60347343-13035480-998b-11e9-974e-450581a1b5d4.png)

Please let me know if you think there could be a better way to fix the problem.

Thanks!
Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tests collection is slow on high number of tests #5516

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tests collection is slow on high number of tests #5516

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions