-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache splitnode results to improve tests collection time #5681
Cache splitnode results to improve tests collection time #5681
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5681 +/- ##
==========================================
- Coverage 96.12% 95.91% -0.21%
==========================================
Files 117 117
Lines 25796 25797 +1
Branches 2495 2495
==========================================
- Hits 24796 24744 -52
- Misses 695 744 +49
- Partials 305 309 +4
Continue to review full report at Codecov.
|
@@ -13,6 +14,7 @@ | |||
tracebackcutdir = py.path.local(_pytest.__file__).dirpath() | |||
|
|||
|
|||
@lru_cache(maxsize=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a cache of mutable return values is asking for trouble -- if this returned tuple
s I'd be more comfortable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would be safer. @Nepherhotep would you mind changing that so we can see how it fares?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicoddemus, a simple converting response format from list to tuple caused 1500 errors. Most of them failed with the error "fixture testdir not found", some of them failed with repr error.
I need to dig into "testdir" fixture issue a bit deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised... strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you missed the early return:
if nodeid == "":
# If there is no root node at all, return an empty list so the caller's logic can remain sane
return []
So sometimes it would return a tuple, others a list.
I've reproduced it locally, returning a tuple at both places worked fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my bad, thanks for pointing to that!
Hi @Nepherhotep, Thanks a lot for this! I also would like to apologize for not following up on #5516, I couldn't find the time yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks!
Please also include a CHANGELOG entry, I suggest:
Cache internal node splitting function which can improve collection performance in very large test suites.
In a changelog/5516.trivial.rst
file.
Thanks!
@Nepherhotep can you show the difference in cpu time / memory consumption? |
@asottile For ~15k tests, collection time decreases from 130 to 90 seconds (i7 8-gen CPU), as for memory, I don't have this information right away. I'll try to collect it. |
I agree, the cache for this function should get equal to the number of nodeids in the section, and that number is fixed and relatively small. @asottile do you still have concerns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! Would you mind squashing the commits before merging? Or I would be happy to do it myself if you let me. 👍 |
2016325
to
b43ebb7
Compare
@nicoddemus done. |
@asottile doesn't like it, IIRC. |
#5516
On a large number of tests, collection time can take over 2 minutes for 20k of tests. 60% of the time is spent on splitting nodes, so caching nodesplit results will improve performance considerably.