Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare dependencies in metadata #2764

Closed
wants to merge 10 commits into from
Closed

Declare dependencies in metadata #2764

wants to merge 10 commits into from

Conversation

jaraco
Copy link
Member

@jaraco jaraco commented Sep 5, 2021

I'm attempting once again to allow Setuptools to declare dependencies. Due to known bootstrapping issues, it's not possible to declare simply the dependencies (especially any dependencies that require setuptools to install) without considering the bootstrapping issue.

This change offers a fallback to add a directory of dependencies to be added to sys.path when Setuptools/pkg_resources is imported/invoked.

…ersion when needed. Doesn't work because pkg_resources can't load entry points from a distribution present only as a wheel.
@jaraco jaraco marked this pull request as draft September 7, 2021 01:11
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer the vendored blobs not to be checked into Git but rather gitignored. There could be a script that would populate them separately. Thinking along the lines of what I attempted in python/cpython#12791.

@jaraco
Copy link
Member Author

jaraco commented Oct 19, 2021

I'm still unsure why the test failures are occurring. They don't occur for me locally, even on Python 3.6. The failures are all in the test_pip_upgrade_from_source tests. I have Python 3.6.6 while the Github actions has 3.6.14.

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Running tests in a Docker VM (Ubuntu), I'm able to replicate the failures on all Pythons 3.6-3.9. It does pass on Python 3.10, but only because the failing tests get skipped due to #2599.

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

The failures have at least two modes. On pip 9.0.1, it fails early checking for setuptools being installed. On 18.3, the tests fail when attempting to install the dependencies:

Processing ./setuptools-58.0.0.post20211020-py3-none-any.whl
Collecting pyparsing==2.2.1 (from setuptools==58.0.0.post20211020)
  Using cached https://files.pythonhosted.org/packages/42/47/e6d51aef3d0393f7d343592d63a73beee2a8d3d69c22b053e252c6cfacd5/pyparsing-2.2.1-py2.py3-none-any.whl
Collecting ordered-set==3.1.1 (from setuptools==58.0.0.post20211020)
  Using cached https://files.pythonhosted.org/packages/a3/b7/d4d69641cbe707a45c23b190f2d717466ba5accc4c70b5f7a8a450387895/ordered-set-3.1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'setuptools'

I believe the issue is that prior to pip 20, there wasn't adequate support for building source packages without setuptools being implicitly installed.

I think I'm prepared to declare non-support for installation from source with these older pips.

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

I believe the reason why the tests don't fail for me locally is because I have a local pip cache with things like 'ordered-set' already built into a wheel.

Edit: No, it's not that. I checked and 'ordered-set' is not in the cache and still installs on macOS.

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

I think I'd prefer the vendored blobs not to be checked into Git but rather gitignored. There could be a script that would populate them separately. Thinking along the lines of what I attempted in python/cpython#12791.

I'd consider something along these lines. I think we can consider that change separately, no?

@jaraco jaraco force-pushed the feature/unvendor branch 3 times, most recently from c3da370 to 95e3d3e Compare October 20, 2021 01:32
@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

FFS. Now tests are failing with some error in flake8.

________________________________ FLAKE8-check _________________________________
[gw0] linux -- Python 3.6.15 /home/runner/work/setuptools/setuptools/.tox/python/bin/python
.tox/python/lib/python3.6/site-packages/pluggy/_hooks.py:265: in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
.tox/python/lib/python3.6/site-packages/pluggy/_manager.py:80: in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
.tox/python/lib/python3.6/site-packages/_pytest/runner.py:170: in pytest_runtest_call
    raise e
.tox/python/lib/python3.6/site-packages/_pytest/runner.py:162: in pytest_runtest_call
    item.runtest()
.tox/python/lib/python3.6/site-packages/pytest_flake8.py:127: in runtest
    self.statistics)
.tox/python/lib/python3.6/site-packages/py/_io/capture.py:150: in call
    res = func(*args, **kwargs)
.tox/python/lib/python3.6/site-packages/pytest_flake8.py:222: in check_file
    app.report_errors()
.tox/python/lib/python3.6/site-packages/flake8/main/application.py:309: in report_errors
    results = self.file_checker_manager.report()
.tox/python/lib/python3.6/site-packages/flake8/checker.py:249: in report
    results_reported += self._handle_results(filename, results)
.tox/python/lib/python3.6/site-packages/flake8/checker.py:161: in _handle_results
    physical_line=physical_line,
.tox/python/lib/python3.6/site-packages/flake8/style_guide.py:430: in handle_error
    code, filename, line_number, column_number, text, physical_line
.tox/python/lib/python3.6/site-packages/flake8/style_guide.py:581: in handle_error
    self.formatter.handle(error)
.tox/python/lib/python3.6/site-packages/flake8/formatting/base.py:100: in handle
    self.write(line, source)
.tox/python/lib/python3.6/site-packages/flake8/formatting/base.py:203: in write
    self._write(line)
.tox/python/lib/python3.6/site-packages/flake8/formatting/base.py:187: in _write
    sys.stdout.buffer.write(output.encode() + self.newline.encode())
E   AttributeError: '_io.StringIO' object has no attribute 'buffer'
------------------------------ Captured log call -------------------------------
WARNING  flake8.options.manager:manager.py:221 option --max-complexity: please update from optparse string `type=` to argparse callable `type=` -- this will be an error in the future

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Aaah. It's because there was a lint error (unused import) and the error handling between pytest and flake8 is broken (PyCQA/flake8#1419).

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Now two tests (those depending on bare_virtualenv) are failing on Windows because apparently pyparsing gets installed in a bare virtualenv. Really?

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Strangely, when I create a virtualenv locally, I don't end up with pyparsing in it. Is it pytest-virtualenv that's adding it?

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Okay, I see it now. pyparsing is present in c:\hostedtoolcache\windows\python\3.9.7\x64\lib\site-packages. But virtualenv isn't supposed to give access to system-site-packages by default.

~ $ virtualenv --help | grep 'site-packages dir'
  --system-site-packages        give the virtual environment access to the system site-packages dir (default: False)

So why is a package from the system site packages leaking into a (bare) virtualenv?

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Ugh. The problem is that bare_virtualenv.run doesn't honor the virtualenv on Windows:

> c:\code\public\pypa\setuptools\setuptools\tests\test_virtualenv.py(48)test_clean_env_install() 
-> bare_virtualenv.run(['python', 'setup.py', 'install'], cd=tmp_src)
(Pdb) bv = bare_virtualenv 
(Pdb) bv.run(['python', '-c', 'import sys; print(sys.executable)']) 
C:\Python39\python.exe 

@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

Okay, so in pytest-virtualenv, it's crucial to pass .python for the executable or use the full string syntax. Otherwise, the subprocess may find "python" on the path and not in the virtualenv.

jaraco added a commit that referenced this pull request Oct 20, 2021
@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

That issue should be fixed in 083fc74.

@jaraco jaraco marked this pull request as ready for review October 20, 2021 20:44
@jaraco
Copy link
Member Author

jaraco commented Oct 20, 2021

I want to release this change exclusive to other changes to allow it to be yanked or backed out, as I suspect there's a high probability of creating regressions.

I did try implementing the _vendored directory with simply the .whl files (and adding them to sys.path), but that didn't work because pkg_resources doesn't recognize dist-info in wheels, only dist-info in a dir or egg-info in an egg). Once we can migrate setuptools off pkg_resources, that issue should go away.

@webknjaz
Copy link
Member

I'd consider something along these lines. I think we can consider that change separately, no?

Change the vendoring mechanism separately? Yes. Do so after merging this PR? No.

Git will accumulate any changes to blobs basically adding to the Git tree size both when adding and when removing those binaries, also when updating. So it's better to do this sooner than later. Best if this is done before this PR merge so that it's rebased on top and doesn't negatively impact the bare repo size.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2021

I'd consider something along these lines. I think we can consider that change separately, no?

Change the vendoring mechanism separately? Yes. Do so after merging this PR? No.

Best if this is done before this PR merge so that it's rebased on top and doesn't negatively impact the bare repo size.

Point taken, but what happens if the technique for bundling dependencies has negative downstream effects? Then the only option is to back out the whole change and not just the bundling behavior.

I agree that it's undesirable to have this bulk content in the repo, so I would like to avoid that.

I'd also like to be able to support pip install from Github, e.g.:

$ pip-run git+https://github.com/pypa/setuptools@feature/unvendor -- -c "import setuptools, appdirs; print(appdirs.__file__)"
...
Successfully installed appdirs-1.4.3 more-itertools-8.8.0 ordered-set-3.1.1 packaging-20.4 pyparsing-2.2.1 setuptools-58.2.0.post20211022 six-1.16.0
/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-bcrng0_h/appdirs.py

It's not obvious to me how that could work if the content isn't already bundled in the repo.

Plus, I observe that the content is currently already bundled in both pkg_resources/_vendor and setuptools/_vendor, so this change limits that effect.

I can rewrite the history to remove .whl files for now, as they're unusable.

@webknjaz
Copy link
Member

Point taken, but what happens if the technique for bundling dependencies has negative downstream
effects? Then the only option is to back out the whole change and not just the bundling behavior.

That's an interesting question. Note that downstream distro packages absolutely love unbundling the vendored deps and depend on packages that are already packaged for their ecosystem. So I'd say they already have to deal with stuff like this.
Of course, it may somehow be problematic as those packages may have an undesired version.
There's a solution for this — ansible/ansible has a special location with a .gitignore in its tree — a _vendor/ folder. The assumption is that if the packagers have problems with getting the desired deps packaged separately, they could just drop them into that folder and those deps would become available on the import path.
Essentially, the point is that downstreams usually have their own ways to populate this sort of stuff (or vice versa — purge the vendored content).

I agree that it's undesirable to have this bulk content in the repo, so I would like to avoid that.

I'd also like to be able to support pip install from Github, e.g.:

$ pip-run git+https://github.com/pypa/setuptools@feature/unvendor -- -c "import setuptools, appdirs; print(appdirs.__file__)"
...
Successfully installed appdirs-1.4.3 more-itertools-8.8.0 ordered-set-3.1.1 packaging-20.4 pyparsing-2.2.1 setuptools-58.2.0.post20211022 six-1.16.0
/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-bcrng0_h/appdirs.py

It's not obvious to me how that could work if the content isn't already bundled in the repo.

May I suggest implementing the download logic + caching in the PEP 517 in-tree build backend? If the dirs are already populated, it'd skip the download (after checking the checksums, I guess).

P.S. Apparently pip's vendoring script is now available on PyPI: https://github.com/pypa/pip/tree/main/src/pip/_vendor#automatic-vendoring / https://pypi.org/project/vendoring/. Maybe it's worth checking it out.

Plus, I observe that the content is currently already bundled in both pkg_resources/_vendor and setuptools/_vendor, so this change limits that effect.

I can rewrite the history to remove .whl files for now, as they're unusable.

Yeah, I'm mostly unhappy with things like whl/exe at this point. It'd be great to catch these with linters. Maybe there's some pre-commit hook repo disallowing blobs?

@@ -13,6 +13,7 @@

from ._deprecation_warning import SetuptoolsDeprecationWarning

import _setuptools_vendored # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding a code comment with an explanation of what this does and why it's important to keep it in this specific position.

@@ -0,0 +1 @@
Setuptools now declares its dependencies in metadata but also vendors libraries satisfying those dependencies if they're not present. This change means that Setuptools cannot install from source under pip earlier than 20.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroncok FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. Will read the description and discussion here, but this sounds terrifying, complicated and quite error prone.

@hroncok
Copy link
Contributor

hroncok commented Oct 23, 2021

In Fedora, the resulting RPM package with setuptools would hard-depend on the declared packages. That means that users who need setuptools installed on runtime would need to have e.g. old packaging installed (and we have already updated packaging). This means we are in dependency hell. We would need to revert to an older version of packaging and never update again until setuptools supports it. This could happen again with any of the other deps in the future.

OTOH If we hack around that and replace the hard dependencies with weak ones, setuptools would only work properly if at least one dependency is missing. Otherwise, the check that checks whether the vendored dependencies are present would find the newer installed packaging and not add the vendored one to sys.path, leading to unexpected behavior. Again, this could happen with any other dep in the future as well, or if an extremely old version of something is installed.

Both could happen in any other environment (e.g. In virtual environments).

Consider a new venv/virtualenv with setuptools installed in it, which is the status quo (and at least for venv it seems to bw the desire for the future as well). Would ensurepip need custom hacks to install the setuptools wheel without deps? Users who pip upgrade after venv creation would then get all the deps installed and they will have dependency clashes when they need e.g. new packaging.

Even if the check uses importlib.matadata to check that proper versions are installed (it really should do that), it means we would always need to ship the bundled libs, as we never know if they are needed. That essentially combines the bad things of both vendoring and declaring deps:

  • we still vendor stuff, it just happens to be used only when certain unpredictable conditions are met
  • we also have dependencies declared which still leads to all sort of problems we have previously tried to solve by vendoring
  • plus, we support situation where the declared deps are not installed, which could lead to all sorts of new problems (resolvers screaming, entry_points failing to load, haisenbugs)

tl;dr I don't understand why are you considering this. It feels worse than the status quo.

Even if you disagree with my point of view, I kindly ask you to wait shipping this until next week, so I can discuss this with the Python SIG in Fedora to gather more input. Thank you.

(Excuse the typos, I wrote this on my phone.)

@jaraco
Copy link
Member Author

jaraco commented Oct 23, 2021

I don't understand why are you considering this.

Good point. I thought it was obvious, but it's clearly not, so I at least owe you the courtesy of writing up the motivations. I'll do that in a separate issue.

I kindly ask you to wait shipping this until next week

For sure. I'm happy to move somewhat cautiously on this one, because it's likely to break things and I want to avoid having to roll it back.

@jaraco
Copy link
Member Author

jaraco commented Oct 23, 2021

In #2825, I've collected (on the fly) the motivations that have driven me in this direction.

This means we are in dependency hell. We would need to revert to an older version of packaging and never update again until setuptools supports it. This could happen again with any of the other deps in the future.

Agreed that's a problem, but it's hardly solved by having vendored dependencies. Instead, the vendored dependencies mean there are multiple copies of possibly incompatible dependencies, masking the conflict. I'd like for Setuptools to get first-class support for dependencies, including continuous updates (unpinning), conflict detection, and code sharing. This will mean that Setuptools pinning to a specific packaging version will cause conflicts, but it will make those conflicts apparent and drive the effort to resolve them as any package must for its dependencies.

I would expect a downstream integrator to reject installing Setuptools into an environment where Setuptools declared dependencies couldn't be satisfied.

Fortunately, for packaging, we have #2822 that promises to unpin the requirement on Packaging.

Consider a new venv/virtualenv with setuptools installed in it, which is the status quo (and at least for venv it seems to bw the desire for the future as well).

It's separate from this issue, but I would encourage users to aggressively start passing --no-setuptools and --no-wheel to their environments and start using --use-pep517 to their pip installs, to start treating Setuptools like a build backend and not install it everywhere. I do that in my local development and I've mostly good experiences with it.

Would ensurepip need custom hacks to install the setuptools wheel without deps?

No - I would expect these environments to get the setuptools deps. I expect the vendored dependencies to be used very rarely (primarily only long enough for Setuptools to build its own metadata so the installer can know the dependencies to install to allow setuptools to build with true dependencies).

This all makes me wonder if this experiment would be easier if setuptools instead used another build backend like flit. I wonder if that would side-step the issue and break the chain.

Even if the check uses importlib.metadata to check that proper versions are installed (it really should do that)

I considered that, and I agree. Unfortunately, there's a separate bootstrapping issue for anything that's in the _setuptools_vendored logic, so the importlib_metadata backport couldn't be conditionally vendored. That's why I started with this naive approach. My expectation is that in most cases (especially as pip moves to using pep517 by default), the declared dependencies will be present and of the requisite version.


I'm going to move this back into draft mode. I'm not comfortable merging it yet, and I'd like to explore the flit backend approach.

My immediate motivation is to get access to importlib_metadata and importlib_resources so that setuptools can loosen its dependency on pkg_resources.

My long term motivation is that Setuptools can be a straightforward package without specialized dependency management.

@jaraco jaraco marked this pull request as draft October 23, 2021 17:48
@hroncok
Copy link
Contributor

hroncok commented Oct 25, 2021

In #2825, I've collected (on the fly) the motivations that have driven me in this direction.

Thanks. I've read that and I could see the motivation for wanting to avoid vendored dependencies altogether. But I don't think this way of doing it is helping it in any (except to see things breaking while trying).

This means we are in dependency hell. We would need to revert to an older version of packaging and never update again until setuptools supports it. This could happen again with any of the other deps in the future.

Agreed that's a problem, but it's hardly solved by having vendored dependencies.

It is solved by vendored dependencies. Yes, vendored dependencies have different kinds of problems, but they do solve the dependency hell.

Instead, the vendored dependencies mean there are multiple copies of possibly incompatible dependencies, masking the conflict. I'd like for Setuptools to get first-class support for dependencies, including continuous updates (unpinning), conflict detection, and code sharing. This will mean that Setuptools pinning to a specific packaging version will cause conflicts, but it will make those conflicts apparent and drive the effort to resolve them as any package must for its dependencies.

I understand. However, starting this effort with dependencies already pinned is a way of doing it that'll cause troubles. For explicit Fedora problems, see the next point...

I would expect a downstream integrator to reject installing Setuptools into an environment where Setuptools declared dependencies couldn't be satisfied.

Indeed. That is exactly what would happen in Fedora. Note that we only ship a single version of setuptools in a repository (this is a simplification, but it behaves like that), so if we decide to update setuptools to a version that introduced a pinned dependency on older packaging, we can only install that version of setuptools and considering our RPM build environment always installs packaging, all our packages that require setuptools to build would become unbuildable until this conflict is resolved. Meaning we would not be able to update setuptools to this version at all until then.

Fortunately, for packaging, we have #2822 that promises to unpin the requirement on Packaging.

I'm glad we do, but this will happen agian.

Consider a new venv/virtualenv with setuptools installed in it, which is the status quo (and at least for venv it seems to bw the desire for the future as well).

It's separate from this issue, but I would encourage users to aggressively start passing --no-setuptools and --no-wheel to their environments and start using --use-pep517 to their pip installs, to start treating Setuptools like a build backend and not install it everywhere. I do that in my local development and I've mostly good experiences with it.

Right, I do like that approach as well, but see https://mail.python.org/archives/list/python-dev@python.org/thread/3BVAUIQOEOXAULHVYQNLLQIZQQETX2EV/

Would ensurepip need custom hacks to install the setuptools wheel without deps?

No - I would expect these environments to get the setuptools deps.

That'll mean that ensurepip and virtualenv will have to add additional 6 wheels contained in them. Are the maintainers of ensurepip and virtualenv aware of this?

I expect the vendored dependencies to be used very rarely (primarily only long enough for Setuptools to build its own metadata so the installer can know the dependencies to install to allow setuptools to build with true dependencies).

Have you considered not installing the vendored dependencies with setuptools, but only containing them in the sdist part of the distribution to allow building setuptools? That seems like a much nicer approach than always shipping them within the setuptools installation.

This all makes me wonder if this experiment would be easier if setuptools instead used another build backend like flit. I wonder if that would side-step the issue and break the chain.

I beg you not to. This would only make the chain much larger and much more painful to handle.

Even if the check uses importlib.metadata to check that proper versions are installed (it really should do that)

I considered that, and I agree. Unfortunately, there's a separate bootstrapping issue for anything that's in the _setuptools_vendored logic, so the importlib_metadata backport couldn't be conditionally vendored. That's why I started with this naive approach. My expectation is that in most cases (especially as pip moves to using pep517 by default), the declared dependencies will be present and of the requisite version.

During PEP 517 builds, they will not be present at all, as they are not listed in pyproject.toml, so the vendored dependencies will be used:

requires = [

During non-PEP 517 builds, I consider the chance of a conflicting version of at least one dependency quite big.

I'm going to move this back into draft mode. I'm not comfortable merging it yet, and I'd like to explore the flit backend approach.

I am glad that you are not considering merging this now, but I am worried about the flit idea.

My immediate motivation is to get access to importlib_metadata and importlib_resources so that setuptools can loosen its dependency on pkg_resources.

It might be easier to wait for Python 3.7 EOL.

My long term motivation is that Setuptools can be a straightforward package without specialized dependency management.

That is a very nice idea for the future, but I honestly think this PR is not leading there.

@hroncok
Copy link
Contributor

hroncok commented Oct 25, 2021

@encukou @pradyunsg @FFY00 Could you please have a look at this? I mean not only the proposed change, but also the discussion. Thanks a lot, folks.

@encukou
Copy link
Contributor

encukou commented Oct 26, 2021

I thought about how to make this simpler. Could it be better to remove some deps first?

  • more_itertools is there for one function, unique_everseen. Vendor just the function? (It can probably be simplified to become qiute trivial, if it's only used on hashables – fwiw, such simplifications are why stdlib itertools has “recipes” in docs rather than many generic functions)
  • OrderedSet is used once; can it be replaced by list and unique_everseen?
  • appdirs is only used in pkg_resources. Can pkg_resources realistically be split out before tackling this?
  • packaging might be more manageable as a single special case.
  • pyparsing is only needed by packaging, but we can hopefully drop that dep later this year (cc @hrnciar)
  • six appears unused

I don't know if there are plans to use more deps (or use the existing ones more), but it might make sense to minimize the problem before this change.

@pradyunsg
Copy link
Member

I've responded on #2825.

tl;dr -- please don't do this yet + consider adopting vendoring to alleviate the pain in the interim, while we update the rest of the ecosystem.

@jaraco
Copy link
Member Author

jaraco commented Nov 12, 2021

I'm abandoning this effort for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants