-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework Session and Package collection #11646
Conversation
10832d7
to
0e62302
Compare
6de3469
to
39f8ae6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its amazing to see this massive undertaking take shape
there are some detail devils and i noted 2 nitpicks
i will try to do some deeper digging into the python collector changes
i'll avoid bikesheding the Dir/Directory naming - naming rights for the implementor !
@@ -489,7 +489,7 @@ def test_collect_topdir(self, pytester: Pytester) -> None: | |||
# assert root2 == rcol, rootid | |||
colitems = rcol.perform_collect([rcol.nodeid], genitems=False) | |||
assert len(colitems) == 1 | |||
assert colitems[0].path == p | |||
assert colitems[0].path == topdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one seems off - the path there should match the nodeid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old tree was
<Session>
<Module test_collect_topdir.py>
Now it's
<Session>
<Dir test_collect_topdir0>
<Module test_collect_topdir.py>
So now the collection (with genitems=False
) collects the Dir
not the Module
.
I'm not sure what was the original purpose of this test (it goes back to e2683f4) but the change here is intentional. Do you think there is something that relies on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perform collect towards nodes ids yielded the concrete files, my understanding is that it ought to keep doing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that you think it should not emit Session -> Dir -> Module
here, but Session -> Module
like before?
I'll try to explain the rationale. Consider for example
repo/
pytest.ini
tests/
test_it.py
Before this PR, the tree is
<Module tests/test_it.py>
<Function test_it>
but if I add a file tests/__init__.py
(i.e. make tests
a package), it becomes
<Package tests>
<Module test_it.py>
<Function test_it>
The idea in this PR is to remove the inconsistency with Package
. In this PR, without __init__.py
the tree is
<Dir repo>
<Dir tests>
<Module test_it.py>
<Function test_it>
and with __init__.py
<Dir repo>
<Package tests>
<Module test_it.py>
<Function test_it>
Now Package
is the python
-plugin-ified Directory
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of that function is to go from a list of nodeids/paths to the initial collectors
Aka the concrete "top-level" fs collectors that are files
The alternative is to ensure all directory collection is limited to filtering of the initial args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I understand what you're saying now. perform_collect
still does this in this PR.
The test is a bit weird. Basically it runs session.perform_collect([""])
. What is the initial collector for this? From a quick test, pytest treats this like pytest .
i.e. current directory. So the new behavior seems right to me. Previously it picked out the test file simply because it was the first and there was no intermediary node.
For illustration, if I add an __init__.py
file in main (not this PR),
diff --git a/testing/test_collection.py b/testing/test_collection.py
index ca2e2b731..424bb9933 100644
--- a/testing/test_collection.py
+++ b/testing/test_collection.py
@@ -478,6 +478,7 @@ class TestCustomConftests:
class TestSession:
def test_collect_topdir(self, pytester: Pytester) -> None:
p = pytester.makepyfile("def test_func(): pass")
+ pytester.makepyfile(__init__="")
id = "::".join([p.name, "test_func"])
# XXX migrate to collectonly? (see below)
config = pytester.parseconfig(id)
then colitems[0]
is the Package
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @bluetech! Thanks a lot for working on this. 👍
Left a bunch of minor suggestions, please take a look.
src/_pytest/main.py
Outdated
if direntry.is_dir(): | ||
if direntry.name == "__pycache__": | ||
continue | ||
ihook = self.ihook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given self.ihook
is a property which calls self.session.gethookproxy
, would it be better to move this outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ihook
returns a FSHookProxy
with a static remove_mods
, and is thus very sensitive to its placement. It's something I'm hoping to improve in the future (though it's not easy to make something fast for this in pluggy), but with how pytest is now ihook
cannot be safely hoisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe within the loop it is safe as the remove_mods depends on self.fspath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the exact problem was with conftest plugins that are loaded after the ihook
and are then not filtered out by remove_mods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my rough understanding is, with the usage if directories, conftests don't get added until after the collect phase,
as such it presents as safe from my perspective
do we consider conftests in packages at their construction or at their collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a second and third look at this, I've made a change which removes an ugly hack from this PR, allows the hoisting, and removes conftest loading from gethookproxy
<< this has been a real pain in my neck.
Recap (a little complicated, sorry):
Current main
In current main, calls to gethookproxy
/ihook
are the trigger for loading non-initial conftests. This basically means that conftest loading is done almost as a random side-effect, uncontrolled and very non-obvious. And it also dashes any hope of making gethookproxy
faster (gethookproxy
shows up prominently in pytest profiles).
I've wanted to improve this for a while, #11268 was the latest step towards that.
PR before change
In this PR, I ran into a problem.
Previously, Session
and Package
did all of the directory traversals inside of their collect
, which loaded the conftests as a side effect. If the conftest loading failed, it will occur inside of the collect()
and cause it to be reported as a failure.
Now I've changed things such that Session.collect
and Package.collect
no longer recurse, but just collect their immediate descendants, and genitems
does the recursive expansion work.
The problem though is that genitems()
doesn't run inside of specific collector's collect
context. So when it loads a conftest, and the conftest loading fails, the exception isn't handled by any CollectReport
and causes an internal error instead.
The way I've fixed this problem is by loading the conftests eagerly in a pytest_collect_directory
post-wrapper, but only during genitems
to make sure the directory is actually selected.
This solution in turn caused the conftests to be collected too early; specifically, the plugins are loaded during the parent's collect()
, one after the other as the directory entries are collected. So when the ihook
is hoisted out of the loop, new plugins are loaded inside the loop, and due to the way the hook proxy works, they are added to the ihook
even though they're supposed to be scoped to the child collectors. So no hoisting.
PR after change
Now I've come up with a better solution: since now the collection tree actually reflects the filesystem tree, what we really want is to load the conftest of a directory right before we run its collect()
. A conftest can affect a directory's collect()
(e.g. with a pytest_ignore_collect
hookimpl), but it cannot affect how the directory node itself is collected. So I just moved the conftest loading to be done right before calling collect
, but still inside the CollectReport
context.
This allows the hoisting, and also removes conftest loading from gethookproxy
since it's no longer necessary. And it will probably enable further cleanups. So I'm happy with it.
def sort_key(entry: "os.DirEntry[str]") -> object: | ||
return (entry.name != "__init__.py", entry.name) | ||
|
||
config = self.config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this point onward Dir.collect
and Package.collect
are essentially identical: can we move this to a common function somewhere and reuse it in both places? They are non-trivial and worth avoiding the duplication IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I live by the Rule of three 😀
But more seriously, I started out with this, with some common private function on Directory
, but this stuff shouldn't be on Directory
so I relented.
My goal is to simplify this loop such that it doesn't need deduplication because it's simple enough (which will also help plugins to implement it correctly). This things that pop out are:
- The
__pycache__
check can move topytest_ignore_collect
-- slight performance hit but probably minor. - If we handle the
ihook
issue we can hoist it out. - If
Path(direntry.path)
is not super slow we can hoist it out - Maybe the
isinitpath
check can move topytest_ignore_collect
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
Thanks for the review @nicoddemus, I pushed a commit addressing some your comments, and replied to the others (let me know if I missed any). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There's one minor comment to address, but is minor and up to you. 👍
Great work on this, this clears up a lot of the code and makes the collection tree much more sane and predictable. 👍
Will be used in upcoming commit.
--- Current main In current main (before pervious commit), calls to gethookproxy/ihook are the trigger for loading non-initial conftests. This basically means that conftest loading is done almost as a random side-effect, uncontrolled and very non-obvious. And it also dashes any hope of making gethookproxy faster (gethookproxy shows up prominently in pytest profiles). I've wanted to improve this for a while, pytest-dev#11268 was the latest step towards that. --- PR before change In this PR, I ran into a problem. Previously, Session and Package did all of the directory traversals inside of their collect, which loaded the conftests as a side effect. If the conftest loading failed, it will occur inside of the collect() and cause it to be reported as a failure. Now I've changed things such that Session.collect and Package.collect no longer recurse, but just collect their immediate descendants, and genitems does the recursive expansion work. The problem though is that genitems() doesn't run inside of specific collector's collect context. So when it loads a conftest, and the conftest loading fails, the exception isn't handled by any CollectReport and causes an internal error instead. The way I've fixed this problem is by loading the conftests eagerly in a pytest_collect_directory post-wrapper, but only during genitems to make sure the directory is actually selected. This solution in turn caused the conftests to be collected too early; specifically, the plugins are loaded during the parent's collect(), one after the other as the directory entries are collected. So when the ihook is hoisted out of the loop, new plugins are loaded inside the loop, and due to the way the hook proxy works, they are added to the ihook even though they're supposed to be scoped to the child collectors. So no hoisting. --- PR after change Now I've come up with a better solution: since now the collection tree actually reflects the filesystem tree, what we really want is to load the conftest of a directory right before we run its collect(). A conftest can affect a directory's collect() (e.g. with a pytest_ignore_collect hookimpl), but it cannot affect how the directory node itself is collected. So I just moved the conftest loading to be done right before calling collect, but still inside the CollectReport context. This allows the hoisting, and also removes conftest loading from gethookproxy since it's no longer necessary. And it will probably enable further cleanups. So I'm happy with it.
36b7ebe
to
e1c66ab
Compare
Had to rebase due to some conflicts. I left the "Different fix for conftest loading" separate, I think it will be useful to have the diff in the history. |
@RonnyPfannschmidt do you intend to review this or should I just merge it? It's currently holding the pytest 8.0 release so it would be good to resolve it one way or the other. |
Im Still down with a Cold, so don't block on me |
OK, hope you'll feel better soon! |
This PR made some of my tests start to fail with https://github.com/pytest-dev/pytest-metadata/actions/runs/7365057317/job/20046008077 The tests pass against c7ee5599, but fail on this merge. Running it locally I also get some warnings: warnings
=============================== warnings summary ===============================
../../../../../../../../../Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211
/Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: pytest_metadata
self._mark_plugins_for_rewrite(hook)
../../../../../../../../../Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211
/Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: tests
self._mark_plugins_for_rewrite(hook)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html I tried to figure out exactly where it goes wrong, but failed. It's entirely possible there's something wrong in my plugin that was just surfaced by the changes in this PR. Please let me know what information I can contribute with. Ping @bluetech |
@BeyondEvil thanks for testing, I'll check it. Can you report it as a separate issue please? |
Sure, I wasn't sure if I should do that given it's not a release, just a commit off main branch. 😅 |
This PR implements the proposals in #7777.