Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache splitnode results to improve tests collection time #5681

Conversation

Nepherhotep
Copy link
Contributor

#5516

On a large number of tests, collection time can take over 2 minutes for 20k of tests. 60% of the time is spent on splitting nodes, so caching nodesplit results will improve performance considerably.

@codecov
Copy link

codecov bot commented Aug 1, 2019

Codecov Report

Merging #5681 into master will decrease coverage by 0.2%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5681      +/-   ##
==========================================
- Coverage   96.12%   95.91%   -0.21%     
==========================================
  Files         117      117              
  Lines       25796    25797       +1     
  Branches     2495     2495              
==========================================
- Hits        24796    24744      -52     
- Misses        695      744      +49     
- Partials      305      309       +4
Impacted Files Coverage Δ
src/_pytest/nodes.py 95.54% <100%> (+0.02%) ⬆️
src/_pytest/pytester.py 85.03% <0%> (-5.93%) ⬇️
testing/test_assertrewrite.py 83.26% <0%> (-1.42%) ⬇️
src/_pytest/terminal.py 92.6% <0%> (-0.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a5e72c...b43ebb7. Read the comment docs.

@@ -13,6 +14,7 @@
tracebackcutdir = py.path.local(_pytest.__file__).dirpath()


@lru_cache(maxsize=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cache of mutable return values is asking for trouble -- if this returned tuples I'd be more comfortable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be safer. @Nepherhotep would you mind changing that so we can see how it fares?

Copy link
Contributor Author

@Nepherhotep Nepherhotep Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicoddemus, a simple converting response format from list to tuple caused 1500 errors. Most of them failed with the error "fixture testdir not found", some of them failed with repr error.
I need to dig into "testdir" fixture issue a bit deeper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised... strange.

Copy link
Member

@nicoddemus nicoddemus Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you missed the early return:

    if nodeid == "":
        # If there is no root node at all, return an empty list so the caller's logic can remain sane
        return []

So sometimes it would return a tuple, others a list.

I've reproduced it locally, returning a tuple at both places worked fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, thanks for pointing to that!

@nicoddemus
Copy link
Member

Hi @Nepherhotep,

Thanks a lot for this!

I also would like to apologize for not following up on #5516, I couldn't find the time yet.

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

Please also include a CHANGELOG entry, I suggest:

Cache internal node splitting function which can improve collection performance in very large test suites.

In a changelog/5516.trivial.rst file.

Thanks!

@asottile
Copy link
Member

asottile commented Aug 1, 2019

@Nepherhotep can you show the difference in cpu time / memory consumption? lru_cache(maxsize=None) is ~usually a memory leak -- want to make sure the trade offs are correct here or if we should be looking to refactor the code such that it doesn't do as many repeated lookups instead

@Nepherhotep
Copy link
Contributor Author

Nepherhotep commented Aug 1, 2019

@Nepherhotep can you show the difference in cpu time / memory consumption? lru_cache(maxsize=None) is ~usually a memory leak -- want to make sure the trade offs are correct here or if we should be looking to refactor the code such that it doesn't do as many repeated lookups instead.

@asottile For ~15k tests, collection time decreases from 130 to 90 seconds (i7 8-gen CPU), as for memory, I don't have this information right away. I'll try to collect it.
I don't think memory increase is substantial - the problem is in splitting the same node path multiple times, but the number of nodes is relatively small.

@nicoddemus
Copy link
Member

nicoddemus commented Aug 1, 2019

I don't think memory increase is substantial - the problem is in splitting the same node path multiple times

I agree, the cache for this function should get equal to the number of nodeids in the section, and that number is fixed and relatively small.

@asottile do you still have concerns?

Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular concern with the memory -- was just curious about the affect.

src/_pytest/nodes.py Outdated Show resolved Hide resolved
@nicoddemus
Copy link
Member

Great, thanks! Would you mind squashing the commits before merging? Or I would be happy to do it myself if you let me. 👍

@Nepherhotep Nepherhotep force-pushed the collection-performance-splitnode-fix branch from 2016325 to b43ebb7 Compare August 2, 2019 00:46
@Nepherhotep
Copy link
Contributor Author

@nicoddemus done.
I wonder, though, why don't you configure GitHub project to do "Squash and Merge" pull request by default - it's a very convenient setup :)

@nicoddemus
Copy link
Member

@asottile doesn't like it, IIRC.

@nicoddemus nicoddemus merged commit dc6e7b9 into pytest-dev:master Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants