Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 518 build requirements cannot be overriden by user #4582

Open
ghost opened this issue Jun 29, 2017 · 65 comments
Open

PEP 518 build requirements cannot be overriden by user #4582

ghost opened this issue Jun 29, 2017 · 65 comments
Labels
PEP implementation Involves some PEP state: needs discussion This needs some more discussion type: enhancement Improvements to functionality

Comments

@ghost
Copy link

ghost commented Jun 29, 2017

Apparently not, it seems to call pip install --ignore-installed ....
Because the build itself is not be isolated from the environment in
other respects, I'm not sure if this is actually sensible behavior by pip...

If the target computer already has a satisfactory version of numpy, then the build system should use that version. Only if the version is not already installed should pip use an isolated environment.

Related: scipy/scipy#7309

@pv
Copy link

pv commented Jun 29, 2017

Some reasons why the current behavior is bad for Numpy especially:

  • Numpy ABI is backward compatible but not forward compatible. If the numpy version pip installs in the "isolated" environment during build is newer than the version present in the system, the built C extensions can segfault if used together with the system numpy.
  • Numpy build takes time, so build-requiring it can add unnecessary 10+ min to the install time.

@njsmith
Copy link
Member

njsmith commented Jun 29, 2017

I responded on the scipy issue before seeing this one: scipy/scipy#7309 (comment)

The most correct solution to the abi issue is to build against the lowest supported numpy version. Building against the currently installed version is a hack that will fail in a number of cases; I mentioned one of them there, but another is that due to pip's wheel caching feature, if you first install scipy into an environment that has the latest numpy, and then later install scipy into an environment that has an older numpy, pip won't even invoke scipy's build system the second time, it'll just install the cached build from last time.

@pv
Copy link

pv commented Jun 29, 2017

Yes, the ABI issue indeed can be handled with specifying the earliest numpy version.

@rgommers
Copy link

The lowest supported version is normally Python version dependent (now numpy 1.8.2 is lowest supported, but clearly not for Python 3.6 because 1.8.2 predated Python 3.6 by a long time).

So the specification will then have to be:

numpy=='1.8.2';python_version<='3.4'
numpy=='1.9.3';python_version=='3.5'
numpy=='1.12.1';python_version=='3.6'

I have the feeling not many projects are going to get this right ....

@pradyunsg pradyunsg added the state: needs discussion This needs some more discussion label Jun 30, 2017
@rgommers
Copy link

rgommers commented Jul 1, 2017

That still leaves a question of what to do for a not-yet-released Python version. Would you do:

numpy=='1.12.1';python_version>='3.6'

or

numpy;python_version>='3.7'

I'd suspect the first one, but either way you have to guess whether or not an existing version of numpy is going to work with a future Python version. You have to think about it though, if you don't specify anything for Python 3.7 then a build in an isolated venv will break (right?). So then you'd have to cut a new release for a new Python version.

@njsmith
Copy link
Member

njsmith commented Jul 1, 2017

I guess the not-yet-released-Python issue is sort of the same as anything else about supporting not-yet-released-Python. When developing a library like (say) scipy, you have to make a guess about how future-Python will work, in terms of language semantics, C API changes, ... and if it turns out you guess wrong then you have to cut a new release? I'm not sure there is a really great solution beyond that.

Something that came up during the PEP 518 discussions, and that would be very reasonable, was the idea of having a way for users to manually override some build-requires when building. This is one situation where that might be useful.

@rgommers
Copy link

rgommers commented Jul 1, 2017

It's a little different in this case - here we use == rather than >= (as typically done in version specifiers in setup.py), which makes it much more critical to guess right.

E.g. if Python 3.7 breaks numpy, then I now need a new numpy release and new releases of every single package that depends on numpy and went with numpy=='1.12.1'.
Normally in version specifiers, you say something like numpy>=x.y.z. Then if the same happens, you need a new numpy release but nothing else.

@pradyunsg
Copy link
Member

Yes, the ABI issue indeed can be handled with specifying the earliest numpy version.
<and>
I have the feeling not many projects are going to get this right ....

I don't think there's any way to do "use earliest compatible version" with pip; would it be something useful in this situation?

@rgommers
Copy link

rgommers commented Jul 5, 2017

@pradyunsg I think in principle yes. Are you thinking about looking at the PyPI classifiers to determine what "earliest compatible" is?

@pradyunsg
Copy link
Member

Are you thinking about looking at the PyPI classifiers to determine what "earliest compatible" is?

TBH, I'm not really sure how this would be done. For one, I don't think we have anything other than the PyPI Classifiers for doing something this and I'm skeptical of using those for determining if pip can install a package...

@rgommers
Copy link

rgommers commented Jul 5, 2017

Yeah that's probably not the most robust mechanism.

@njsmith
Copy link
Member

njsmith commented Jul 5, 2017

There is a way to specify earliest compatible python version in package metadata. Not the trove classifiers – those are just informational. The IPython folks worked this out because they needed to be able to tell pip not to try to install new IPython on py2.

The problem with this though is that old numpy packages can't contain metadata saying that they don't work with newer python, because by definition we don't know that until after the new python is released. (Also I think the current metadata might just be "minimum python version", not "maximum python version".)

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

The current metadata is not minimum or maximum, but a full version specifier which supports >=, >, ==, <``, etc. I suspect the biggest blockers here are:

  1. That metadata is relatively new, so hardly anything is using it currently. What do we do if nothing has it, do we just assume everything is compatible and install the oldest version available? That seems unlikely to work but it's also confusing if we suddenly switch from installing the latest to the newest once they upload a version that has that metadata.
  2. A project can't know in the future what version of Python it's going to stop working on, pip 9 currently works on Python 3.6, will it work on Python 3.12? I have no idea!

Maximum versions that don't represent a version that already exist are basically impossible to get right except by pure chance. You pretty much always end up either under or over specifying things.

I had always presumed that Numpy would change the wheel requiremnts to reflect the version that it was built against, so that the dependency solver then (theortically until #988 is solved) handles things to ensure there is no version incompatibility related segfaults.

I think the worse case here is you end up installing something new that depends on Numpy and end up having to also install a new Numpy because now you have something that has a numpy>=$LATEST requirement, but since all the old things have a numpy>=$OLDERVERSION requirement, they won't need to be reinstalled, just numpy and the new thing. Combine this with the wheel cache and the fact that Numpy is pretty good about providing wheels for the big 3 platforms, it feels like this isn't going to be a big deal in practice?

Am I missing something?

@njsmith
Copy link
Member

njsmith commented Jul 5, 2017

@dstufft: the specific concern here is how to handle build requires (not install requires) for downstream packages like scipy that use the numpy C API.

The basic compatibility fact that needs to be dealt with is: if you build scipy against numpy 1.x.y, then the resulting binary has a requirement for numpy >= 1.x.0 (though traditionally this has been left implicit)

In the past, this has been managed in one of two ways:

  1. If you were downloading a scipy wheel, then that was built by some expert who had manually set up their build environment in an optimal way so the wheel would work everywhere you expect.

  2. If you were building scipy yourself, you were using setup.py install, so the build environment would always be the same as the install environment. Each environment gets its own bespoke build of scipy that's adapted to the numpy in that environment. (And mostly people only upgrade numpy inside an existing environment, never downgrade, so this mostly works out.)

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments. So it's the second use case above, but technically it's implemented acts more like the first, except now the expert's manual judgement has been replaced by an algorithm.

The policy that experts would traditionally use for building a scipy wheel was: install the latest point release of the oldest numpy that meets the criteria (a) scipy still supports it, and (b) it works on the python version that this wheel is being built for.

This works great when implemented as a manual policy by an expert, but it's rather beyond pip currently, and possibly forever... And @rgommers is pointing out that if we encode it manually as a set of per-python-version pins, and then bake those pins into the scipy sdist, the resulting sdists will only support being built into wheels on python versions that they have correct pins for. Whereas in the past, when a new python version came out, if you were doing route (1) then the expert would pick an appropriate numpy version at the time of build, and if you were doing route (2) then you'd implicitly only ever build against numpy versions that work on the python you're installing against.

That's why having at least an option for end users to override the pyproject.toml requirements would be useful: if you have a scipy sdist that says it wants numpy == 1.12.1; python >= "3.7", but in fact it turns out that on 3.7 you need numpy 1.13.2, you could do pip install --override="numpy == 1.13.2" scipy.tar.gz. That solves the wheels for redistribution case, and provides at least some option for end users building from sdist. The case it doesn't handle is when plain pip install someproject ends up needing to install from an sdist and in the past this kinda worked seamlessly via the setup.py install route, but now would require end users to occasionally do this manual override thing.

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

@njsmith I don't understand why it's bad for SciPy to implicitly get built against a newer NumPy though. When we install that build SciPy anything already installed will still work fine, becuase NumPy is >= dependency and a newer one is >= an older one, and we'll just install a newer NumPy when we install that freshly built SciPy to satisify the constraint that SciPy's wheel will have for a newer NumPy.

@pfmoore
Copy link
Member

pfmoore commented Jul 5, 2017

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments.

Sorry to butt in here, but are we? I don't see that at all as what's happening. I would still expect the vast majority of installs to be from published wheels, built by the project team by their experts (your item 1).

The move to pyproject.toml and PEP 517 allows projects to use alternative tools for those builds, which hopefully will make those experts' jobs easier as they don't have to force their build processes into the setuptools mould if there's a more appropriate backend, but that's all.

It's possible that the changes we're making will also open up the possibility of building their own installation to people who previously couldn't because the setuptools approach was too fragile for general use. But those are people who currently have no access to projects like scipy at all. And it's possible that people like that might share their built wheels (either deliberately, or via the wheel cache). At that point, maybe we have an issue because the wheel metadata can't encode enough of the build environment to distinguish such builds from the more carefully constructed "official" builds. But surely the resolution for that is simply to declare such situations as unsupported ("don't share home-built wheels of scipy with other environments unless you understand the binary compatibility restrictions of scipy").

You seem to be saying that pip install <some sdist that depends on numpy> might fail - but I don't see how. The intermediate wheel that pip builds might only be suitable for the user's machine, and the compatibility tags might not say that, but how could it not install on the machine it was built on?

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

To be clear, I understand why it's bad for that to happen for a wheel you're going to publish to PyPI, because you want those wheels to maintain as broad of compatibility as possible. But the wheels that pip is producing implicitly is generally just going to get cached in the wheel cache for this specific machine.

@rgommers
Copy link

rgommers commented Jul 5, 2017

To be clear, I understand why it's bad for that to happen for a wheel you're going to publish to PyPI, because you want those wheels to maintain as broad of compatibility as possible. But the wheels that pip is producing implicitly is generally just going to get cached in the wheel cache for this specific machine.

That's the whole point of this issue, that wheel built on a user system can now easily be incompatible with the numpy already installed on the same system. This is because of build isolation - pip will completely ignore the one already installed, and build a scipy wheel against a new numpy that it grabs from pypi in its isolated build env. So if installed_numpy < built_against_numpy, won't work.

Hence @njsmith points out that an override to say something like

pip install scipy --override-flag numpy==x.y.z

would be needed.

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

@rgommers But why can't pip just upgrade the NumPy that was installed to match the newer version that the SciPy wheel was just built against? I'm trying to understand the constraints where you're able to install a new version of SciPy but not a new version of NumPy.

@rgommers
Copy link

rgommers commented Jul 5, 2017

@rgommers But why can't pip just upgrade the NumPy that was installed to match the newer version that the SciPy wheel was just built against?

It can, but currently it won't. The build-requires is not coupled to install-requires.

I'm trying to understand the constraints where you're able to install a new version of SciPy but not a new version of NumPy.

For the majority of users this will be fine. Exceptions are regressions in numpy, or (more likely) not wanting to upgrade at that point in time due to the extra regression testing required.

@rgommers
Copy link

rgommers commented Jul 5, 2017

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments.

Sorry to butt in here, but are we? I don't see that at all as what's happening. I would still expect the vast majority of installs to be from published wheels, built by the project team by their experts (your item 1).

Agreed that in general we are not moving in that direction. That third scenario is becoming more prominent though when we're moving people away from setup.py install to pip install, and the build isolation in PEP 518 currently is a regression for some use cases.

The move to pyproject.toml and PEP 517 allows projects to use alternative tools for those builds, which hopefully will make those experts' jobs easier as they don't have to force their build processes into the setuptools mould if there's a more appropriate backend, but that's all.

Totally agreed that PEP 517 and the direction things are moving in is a good one.

The only thing we’re worried about here is that regression for build isolation - it’s not a showstopper, but at least needs an override switch for things in the pyproject.toml build-requires so pip install project-depending-on-numpy can still be installed without being forced to upgrade numpy.

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

It can, but currently it won't. The build-requires is not coupled to install-requires.

For SciPy and other things that link against NumPy it probably should be right? I understand that in the past it was probably painful to do this, but as we move forward it seems like that is the correct thing to happen here (independent of is decided in pip) since a SciPy that links against NumPy X.Y needs NumPy>=X.Y and X.Y-1 is not acceptable.

For the majority of users this will be fine. Exceptions are regressions in numpy, or (more likely) not wanting to upgrade at that point in time due to the extra regression testing required.

To be clear, I'm not explicitly against some sort of override flag. Mostly just trying to explore why we want it to see if there's a better solution (because in general more options adds conceptual overhead so the fewer we have the better, but obviously not to the extreme where we have no options).

One other option is for people who can't/won't upgrade their NumPy to switch to building using the build tool directly and then provide that wheel using find-links or similar.

I'm not sure which way I think is better, but I suspect that maybe this might be something we would hold off on and wait and see how common of a request it ends up being to solve this directly in pip. If only a handful of users ever need it, then maybe the less user friendly but more powerful/generic mechanism of "directly take control of the build process and provide your own wheels" ends up winning. If it ends up being a regular thing that is fairly common, then we figure out what sort of option we should add.

@njsmith
Copy link
Member

njsmith commented Jul 5, 2017

Yeah, scipy and other packages using the numpy C API ought to couple their numpy install-requires to whichever version of numpy they're built against. (In fact numpy should probably export some API saying "if you build against me, then here's what you should put in your install-requires".) But that's a separate issue.

The pyproject.toml thing is probably clearer with some examples though. Let's assume we're on a platform where no scipy wheel is available (e.g. a raspberry pi).

Scenario 1

pip install scipy into a fresh empty virtualenv

Before pyproject.toml: this fails with an error, "You need to install numpy first". User has to manually install numpy, and then scipy. Not so great.

After pyproject.toml: scipy has a build-requires on numpy, so this automatically works, hooray

Scenario 2

pip install scipy into a virtualenv that has an old version of numpy installed

Before pyproject.toml: scipy is automatically built against the installed version of numpy, all is good

After pyproject.toml: scipy is automatically built against whatever version of numpy is declared in pyproject.toml. If this is just requires = ["numpy"] with no version constraint, then it's automatically built against the newest version of numpy. This gives a version of scipy that requires the latest numpy. We can/should fix scipy's build system so that at least it knows that it , but doing this for all projects downstream of numpy will take a little while. And even after that fix, this is still problematic if you don't want to upgrade numpy in this venv; and if the wheel goes into the wheel cache, it's problematic if you ever want to create a venv on this machine that uses an older version of numpy + this version of scipy. For example, you might want to test that the library you're writing works on an old version of numpy, or switch to an old version of numpy to reproduce some old results. (Like, imagine a tox configuration that tries to test against old-numpy + old-scipy, numpy == 1.10.1, scipy == 0.17.1, but silently ends up actually testing against numpy-latest + scipy == 0.17.1 instead.) Not so great

OTOH, you can configure pyproject.toml like requires = ["numpy == $SPECIFICOLDVERSION"]. Then scipy is automatically built against an old version of numpy, the wheel in the cache works with any supported version of numpy, all is good

Scenario 3

pip install scipy into a python 3.7 virtualenv that has numpy 1.13 installed

Before pyproject.toml: You have to manually install numpy, and you might have problems if you ever try to downgrade numpy, but at least in this simple case all is good

After pyproject.toml: If scipy uses requires = ["numpy"], then you get a forced upgrade of numpy and all the other issues described above, but it does work. Not so great

OTOH, if scipy uses requires = ["numpy == $SPECIFICVERSION"], and it turns out that they guessed wrong about whether $SPECIFICVERSION works on python 3.7, then this is totally broken and they have to roll a new release to support 3.7.

Summary

Scipy and similar projects have to pick how to do version pinning in their pyproject.toml, and all of the options cause some regression in some edge cases. My current feeling is that the numpy == $SPECIFICVERSION approach is probably the best option, and overall it's great that we're moving to a more structured/reliable/predictable way of handling all this stuff, but it does still have some downsides. And unfortunately it's a bit difficult to tell end-users "oh, right, you're using a new version of python, so what you need to do first of all is make a list of all the packages you use that link against numpy, and then write a custom build frontend..."

@njsmith
Copy link
Member

njsmith commented Jul 5, 2017

Maybe we should open a separate issue specifically for the idea of a --build-requires-override flag to mitigate these problems. But some other use cases:

  • IIRC @rbtcollins was strongly in favor of a flag like this; my impression was that he had in mind situations like "whoops, it turns out that the latest point release of some utility library deep in my stack broke something, but I need some way to install openstack anyway"

  • Suppose a broken version of setuptools accidentally gets released. I think it's fair to assume there will be tons of libraries that use requires = ["setuptools"] with no version pin. All python package installation grinds to a halt (except for people whose entire stack is available as wheels, but unfortunately that's still not common). If there's no --build-requires-override, then what we do?

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

I don't think we need a new issue, I think this issue is fine I'll just update the title because the current title isn't really meaningful I think.

@dstufft dstufft changed the title PEP 518 behavior is not sensible PEP 518 build requirements cannot be overriden by user Jul 5, 2017
@njsmith
Copy link
Member

njsmith commented Jul 5, 2017

Agreed that the original title was not meaningful, but there are two conceptually distinct issues here. The first is that pyproject.toml currently causes some regressions for projects like scipy – is there anything we can/should do about that? The second is that hey, user overrides might be a good idea for a few reasons; one of those reasons is that they could mitigate (but not fully fix) the first problem.

Maybe the solution to the first problem is just that we implement user overrides and otherwise live with it, in which case the two discussions collapse into one. But it's not like we've done an exhaustive analysis of the scipy situation and figured out that definitely user overrides are The Solution, so if someone has a better idea then I hope they'll bring it up, instead of thinking that we've already solved the problem :-)

@dstufft
Copy link
Member

dstufft commented Jul 5, 2017

@njsmith It's interesting to me that you think that numpy == $SPECIFICVERSION is the best option, because from my POV just letting pip upgrade to the latest version of NumPy seems like the best option, but that's not really important here since each project gets to pick what version of their build dependencies makes sense for them.

I suspect that for a hypothetical --build-requires-override we would prevent caching any wheels generated with an overriden build requirement. Otherwise you get into what I think is a bad situation where you get a cached wheel generated from essentially a different source that you just have to kind of remember that you used an override with to know the state of it (we don't cache wheels when you're using --build-option for similar reasons).

It also suffers from the same problem that a lot of our CLI options like this tend to hit, which is there isn't really a user friendly way to specify it. If you have --override-flag=numpy==1.0 effect everything we're installing that is typically not what you want (for instance, not everything might depend on numpy at all, or things might require different versions of the same build tool to build their wheels). However trying to specify things on a per project basis quickly ends up really gross, you start having to do things like --override-flag=scipy:numpy==1.0 (and what happens if something build requires on scipy, but a version of scipy that is incompatible with that version of numpy?).

At some point the answer becomes "sorry your situation is too complex, you're going to have to start building your own wheels and passing them into --find-links" but at a basic level parameterizing options by an individual package inside the entire set of packages is still somewhat of an unsolved problem in pip (and so far each attempt to solve it has been met with user pain).

So part of my... hesitation, is that properly figuring out the right UX of such a flag is non trivial and if we don't get the UX to be better than the base line of building a wheel and chucking it into a wheelhouse then it's a net negative.

@jdemeyer
Copy link

@brainwane
Copy link
Contributor

Regarding the part of this problem that is blocked by the lack of a proper dependency resolver for pip: the beta of the new resolver is in pip 20.2 and we aim to roll it out in pip 20.3 (October) as the default. So if the new resolver behavior helps this problem (or makes it worse) now would be a good time to know.

@akaihola
Copy link
Contributor

I think we're hitting this issue as well. We have an in-house package whose code is compatible with NumPy version 1.11.2 and up. We need to maintain some legacy remote production environments where we can't upgrade NumPy up from 1.11.2, but in other environments we want to stay up-to-date with newest NumPy.

In our package, we migrated to using pyproject.toml:

[build-system]
requires = ["Cython", "numpy", "setuptools>=40.8.0", "wheel>=0.33.6"]

When building the package for the legacy environment, we use one this constraints file:

# constraints.legacy.txt
numpy==1.11.2
scipy==0.18.1
# etc.

For modern environments we have e.g.

# constraints.new.txt
numpy==1.19.2
scipy==1.5.2
# etc.

When running tests in CI for our package, we do the equivalent of either

pip install --constraint constraints.legagy.txt  --editable .
pytest

or

pip install --constraint constraints.new.txt --editable .
pytest

However, in both cases the newest NumPy available is installed and compiled against, and running our package in the old environment miserably fails:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "ourpackage/ourmodule.pyx", line 1, in init ourpackage.ourmodule
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

What we would like pip to do is respect the pinned versions from --constraint also for build dependencies.

@uranusjr
Copy link
Member

uranusjr commented Sep 18, 2020

To be clear, pip never supported overriding dependencies anywhere, either build or run-time. The “trick” people used to use depends on a quirky behaviour of pip’s current (soon legacy) dependency resolver that should (eventually) go away. In that sense, it makes perfect sense that requirements specified from the command line does not override build dependencies in pyproject.toml, since that means that the PEP 517 successfully avoids a bug.

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem. (I think I also mentioned this idea somewhere else, but can’t find it now.)

There are more than one way to solve the build ABI issue, and introducing dependency overriding for it feels like falling into the XY problem trap to me. Dependency overriding is a much more general problem, and whether that should be possible (probably yes at some point, since pip is progressively making the resolver stricter, and people will need an escape hatch eventually) is an entirely other issue, and covered in other discussions.

@rgommers
Copy link

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem.

+1 this is a healthy idea in general, and I don't see serious downsides.

Note that for numpy specifically, we try to teach people good habits, and there's a package oldest-supported-numpy that people can depend on in pyproject.toml. But many people new to shipping a package on PyPI won't be aware of that.

@pradyunsg
Copy link
Member

Something like the situations discussed here has happened today -- setuptools has started rejecting invalid metadata and users affected by this have no easy workarounds.

@jaraco posted #10669, with the following design for a solution.

I imagine a solution in which pip offers options to extend and constrain build dependencies at install time. Something like:

--build-requires=<dependencies or file:requirements>
--build-constraints=<constraints or file:constraints>

These additional requirements would apply to all builds during the installation. To limit the scope of the specifications, it should also allow for a limited scope:

--build-requires=<project>:<dependencies or file:requirements>
--build-constraints=<project>:<constraints or file:constraints>

For a concrete example, consider a build where setuptools<59 is needed for django-hijack and setuptools_hacks.distutils_workaround is needed for all projects and the deps in scipy-deps.txt is required for mynumpy-proj:

pip install --use-pep517 --build-constraints "django-hijack:setuptools<59" --build-requires "setuptools_hacks.distutils_workaround" --build-requires "mynumpy-proj:file:scipy-deps.txt"

The same specification should be able to be supplied through environment variables.

@uranusjr
Copy link
Member

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem.

Some more thoughts I’ve had during the past year on this idea. Choosing a build dependency matching the runtime one is the easy part; the difficult part is the runtime dependency version may change during resolution, i.e. when backtracking happens. And when that happens, pip will need to also change the build dependency, because there’s no guarantee the newly changed runtime dependency has ABI compatibility with the old. And here’s where the fun part begins. By changing the build dependency, pip will need to rebuild that source distribution, and since there’s no guarantee the rebuild will have the same metadata as the previous build, the resolver must treats the two builds as different candidates. This creates a weird these-are-the-same-except-not-really problem that’s much worse than PEP 508 direct URL, since those builds likely have the same name, version (these two are easy), source URL (!) and wheel tags (!!) It’s theoratically all possible to implement, but the logic would need a ton of work.

I imagine a solution in which pip offers options to extend and constrain build dependencies at install time.

And to come back to the “change the build dependency” thing. There are fundamentally two cases where an sdist’s build dependencies need to be overridden:

  1. The dependencies are declared that can arguably be considered “correct”, but I want the resolver to interpret it more smartly. This is the case for the ABI compatibility use case and I think there are better solutions for that.
  2. The dependencies are just declared wrong and I need to change it to something else (e.g. add or remove a dependency, make the version range wider). This kind of use case is fundamentally the same as Relaxing / Ignoring constraints during dependency resolution #8076 but for build dependencies, and I think the same logic applies. IMO allowing for direct dependency overriding is too heavy-handed a solution to be implemented in pip, and we should instead explore ways for the user to hot-patch a package and make pip accept that patched artifact instead. For build dependencies, this means providing a tool to easily extract, fix pyproject.toml, re-package, and seamlessly tell pip to use that new sdist. pip likely still needs to provide some mechanism to enable the last “seamlessly tell pip” part, but the rest of the workflow does not belong in pip IMO, but a separate tool. (It would be a pip plugin if pip has a plugin architecture, but it does not.)

@rgommers
Copy link

rgommers commented Nov 20, 2021

And here’s where the fun part begins. By changing the build dependency, pip will need to rebuild that source distribution, and since there’s no guarantee the rebuild will have the same metadata as the previous build, the resolver must treats the two builds as different candidates.

I'm not sure I agree with that. Yes, it's technically true that things could now break - but it's a corner case related to the ABI problem, and in general

  • (a) things will mostly still work,
  • (b) sdists typically don't pin or put upper bounds on all their build dependencies so any new release of any build dependency can change what pip install pkg_being_built results in exactly even on the same machine today. pip does not take versions of build dependencies into account at all in its current caching strategy.

A few thoughts I've had on this recently:

  • packages should anyway put upper bounds on all their build-time dependencies (sometimes <=last_version_on_pypi, sometimes <=next_major_version, sometimes <= 2_years_into_the_future). This discussion shows why. It is unlikely to cause problems (because of build isolation), and guaranteed to avoid problems (sooner or later a new setuptools will break your packages' build for example).
  • yes we need a way to override build dependencies, but it shouldn't be the only strategy. that's too much too ask from each and every user of a package on a platform where the break happens.
  • the ABI problem is a bit of a special-case, so let's not design for that one too much. NumPy is the most prominent example of this, and we should teach people to use oldest-supported-numpy. Also, we have detailed docs for depending on NumPy.
  • it's a real problem that PyPI does not allow either editing metadata after the fact, or re-uploading a new sdist (or can you do the build number bump trick for an sdist as well, rather than only for a wheel?). Otherwise this requires a new .postX release to fix a broken x.y.z package version, and doing a release can be an extremely time-consuming operation.

@uranusjr
Copy link
Member

I agree it should mostly work without the rebuilding part, but things already mostly work right now, so there is only value to doing anything for the use case if we can go beyond mostly and make things fully work. If a solution can’t cover that last mile, we should not persue it in the first place because it wouldn’t really improve the situation meaningfully.

I listed later in the previous comment the two scenarios people generally want to override metadata. The former case is what “mostly works” right now, and IMO we should either not do anything about it (because what we already have is good enough), or persue the fix to its logical destination and fix the problem entirely (which requires the resolver implementation I mentioned).

The latter scenario is what we don’t currently have a solution that even only “mostly” works, unlike the former, so there’s something to be done, but I’m also arguing that something should not be directly built into pip entirely.

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2021

Looking at this issue and the similar one reported in #10731, are we looking at this from the wrong angle?

Fundamentally, the issue we have is that we don't really support the possibility of two wheels, with identical platform tags, for the same project and version of that project, having different dependency metadata. It's not explicitly covered in the standards, but there are a lot of assumptions made that wheels are uniquely identified by name, version and platform tag (or more explicitly, by the wheel filename).

Having scipy wheels depend on a specific numpy version that's determined at build time, violates this assumption, and there's going to be a lot of things that break as a result (the pip cache has already been mentioned, as has portability of the generated wheels, but I'm sure there will be others). I gather there's an oldest-supported-numpy package these days, which I assume encodes "the right version of numpy to build against". That seems to me to be a useful workaround for this issue, but the root cause here is that Python metadata really only captures a subset of the stuff that packages can depend on (manylinux hit this in a different context). IMO, allowing users to override build requirements will provide another workaround1 in this context, but it won't fix the real problem (and honestly, expecting the end user to know how to specify the right overrides is probably optimistic).

If we want to properly address this issue, we probably need an extension to the metadata standards. And that's going to be a pretty big, complicated discussion (general dependency management for binaries is way beyond the current scope of Python packaging).

Sorry, no answers here, just more questions 🙁

Footnotes

  1. Disabling build isolation is another one, with its own set of problems.

@pradyunsg
Copy link
Member

pradyunsg commented Dec 16, 2021

I think being able to provide users with a way to say "I want all my builds to happen with setuptools == 56.0.1" is worthwhile; even if we don't end up tackling the binary compatibility story. That's useful for bug-for-bug compatibility, ensuring that you have deterministic builds and more.


I think the "fix" for the binary compatibility problem is complete rethink of how we handle binary compatibility (which is a lot of deeply technical work) which needs to pass through our standardisation process (which is a mix of technical and social work). And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

If there is interest and we think the value is sufficient, I'm afraid I'm still not quite sure how tractable the problem even is and where we'd want to draw the line of what we want to bother with.

I'm sure @rgommers, @njs, @tgamblin and many other folks will have thoughts on this as well. They're a lot more familiar with this stuff than I am.

As for the pip caching issue, I wonder if there's some sort of cache busting that can be done with build tags in the wheel filename (generated by the package). It won't work for PyPI wheels, but it should be feasible to encode build-related information in the build tag, for the packages that people build themselves locally. This might even be the right mechanism to try using existing semantics of toward solving some of the issues.

Regardless, I do think that's related but somewhat independent of this issue.

@pradyunsg
Copy link
Member

To be clear, build tags are a thing in the existing wheel file format: https://www.python.org/dev/peps/pep-0427/#file-name-convention

@rgommers
Copy link

@pfmoore those are valid questions/observations I think - and a lot broader than just this build reqs issue. We'd love to have metadata that's understood for SIMD extensions, GPU support, etc. - encoding everything in filenames only is very limiting.

(and honestly, expecting the end user to know how to specify the right overrides is probably optimistic).

This is true, but it's also true for runtime dependencies - most users won't know how that works or if/when to override them. I see no real reason to treat build and runtime dependencies in such an asymmetric way as is done now.

If we want to properly address this issue, we probably need an extension to the metadata standards. And that's going to be a pretty big, complicated discussion (general dependency management for binaries is way beyond the current scope of Python packaging).

Agreed. It's not about dependency management of binaries though. There are, I think, 3 main functions of PyPI:

  1. Be the authoritative index of Python packages, flow of open source code from authors to redistributors (Linux distros, Homebrew, conda-forge, etc.)
  2. Let end users install binaries (wheels)
  3. Let end users install from source (sdist's)

This mix of binaries and from-source builds is the problem, and in particular - also for this issue - (3) is what causes most problems. It's naive that we expect that from-source builds of packages with complicated dependencies will work for end users. This is obviously never going to work reliably when builds are complex and have non-Python dependencies. An extension of metadata alone is definitely not enough to solve this problem. And I can't think of anything that will really solve it, because even much more advanced "package manager + associated package repos" where complete metadata is enforced don't do both binary and from-source installs in a mixed fashion.

And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

I have an interest, and some budget, for thoroughly documenting all the key problems that we see for scientific & data-science/ML/AI packages in the first half of next year. In order to be at least on the same page about what the problems are, and can discuss which ones may be solvable and which ones are going to be out of scope.

Regardless, I do think that's related but somewhat independent of this issue.

agreed

@pfmoore
Copy link
Member

pfmoore commented Dec 16, 2021

I agree that being able to override build dependencies is worthwhile, I just don't think it'll necessarily address all of the problems in this space (e.g., I expect we'll still get a certain level of support questions from people about this, and "you can override the build dependencies" won't be seen as an ideal solution - see #10731 (comment) for an example of the sort of reaction I mean).

To be clear, build tags are a thing in the existing wheel file format

Hmm, yes, we might be able to use them somehow. Good thought.

And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

I think it's a significant issue for some of our users, who would consider it justified. The problem for the pip project is how we spend our limited resources - even if the packaging community1 develops such a standard, should pip spend time implementing it, or should we work on something like lockfiles, or should we focus on critically-needed UI/UX rationalisation and improvement - or something else entirely?

I see no real reason to treat build and runtime dependencies in such an asymmetric way as is done now.

Agreed. This is something I alluded to in my comment above about "UI/UX rationalisation". I think that pip really needs to take a breather from implementing new functionality at this point, and tidy up the UI. And one of the things I'd include in that would be looking at how we do or don't share options between the install process and the isolated build environment setup. Sharing requirement overrides between build and install might just naturally fall out of something like that.

But 🤷, any of this needs someone who can put in the work, and that's the key bottleneck at the moment.

Footnotes

  1. And the same problem applies for the packaging community, in that we only have a certain amount of bandwidth for the PEP process, and we don't have a process for judging how universal the benefit of a given PEP is. Maybe that's something the packaging manager would cover, but there's been little sign of interaction with the PyPA from them yet, so it's hard to be sure.

@pradyunsg
Copy link
Member

pradyunsg commented Dec 16, 2021

/cc @s-mm since her ongoing work has been brought up in this thread!

@tgamblin
Copy link

@rgommers:

We'd love to have metadata that's understood for SIMD extensions, GPU support, etc.

I think this is relevant as we (well, mostly @alalazo and @becker33) wrote a library and factored it out of Spack -- initially for CPU micro-architectures (and their features/extensions), but we're hoping GPU ISA's (compute capabilities, whatever) can also be encoded.

The library is archspec. You can already pip install it. It does a few things that might be interesting for package management and binary distribution. It's basically designed for labeling binaries with uarch ISA information and deciding whether you can build or run that binary. Specifically it:

  1. Defines a compatibility graph and names for CPU microarchitectures (defined in microarchitectures.json)
  2. It'll detect the host microarchitecture (on macOS and Linux so far)
  3. You can ask things like "is a zen2 binary compatible with cascadelake?", or "will an x86_64_v4 binary run on haswell?" (we support generic x86_64 levels, which are also very helpful for binary distribution)
  4. You can query microarchitectures for feature support (does the host arch support avx512?)
  5. You can ask, given a compiler version and a microarchitecture, what flags are needed for that compiler to emit that uarch's ISA. For things like generic x86-64 levels we try to emulate that (with complicated flags) for older compilers that do not support those names directly.

We have gotten some vendor contributions to archspec (e.g., from AMD and some others), but if it were adopted by pip,I think we'd get more, so maybe a win-win? It would be awesome to expand the project b/c I think we are trying to solve the same problem, at least in this domain (ISA compatibility).

More here if you want the gory details: archspec paper

@tgamblin
Copy link

tgamblin commented Dec 16, 2021

@pradyunsg:

I think being able to provide users with a way to say "I want all my builds to happen with setuptools == 56.0.1" is worthwhile; even if we don't end up tackling the binary compatibility story.

Happy to talk about how we've implemented "solving around" already-installed stuff and how that might translate to the pip solver. The gist of that is in the PackagingCon talk -- we're working on a paper on that stuff as well and I could send it along when it's a little more done if you think it would help.

I think fixing a particular package version isn't actually all that hard -- I suspect you could implement that feature mostly with what you've got. The place where things get nasty for us are binary compatibility constraints -- at the moment, we model the following on nodes and can enforce requirements between them:

  • compiler used to build, and its version
  • variants (e.g. is a particular build option enabled)
  • target uarch (modeled by archspec, mentioned above)
  • transitive dependencies: if you say you want a particular numpy, we also make sure you use its transitive dependencies. We're working on a model where we could loosen that as long as things are binary compatible (and we have a notion of "splicing" a node or sub-dag into a graph and preserving build provenance that we're experimenting with).

The big thing we are working on right now w.r.t. compatibility is compiler runtime libraries for mixed-compiler (or mixed compiler version) builds (e.g., making sure libstdc++, openmp libraries, etc. are compatible). We don't currently model compilers or their implicit libs as proper dependencies and that's something we're finally getting to. I am a little embarrassed that I gave this talk on compiler dependencies in 2018 and it took a whole new solver and too many years to handle it.

The other thing we are trying to model is actual symbols in binaries -- we have a research project on the side right now to look at verifying the compatibility of entry/exit calls and types between libraries (ala libabigail or other binary analysis tools). We want to integrate that kind of checking into the solve. I consider this part pretty far off at least in production settings, but it might help to inform discussions on binary metadata for pip.

Anyway, yes we've thought about a lot of aspects of binary compatibility, versioning, and what's needed as far as metadata quite a bit. Happy to talk about how we could work together/help/etc.

@rgommers
Copy link

The library is archspec. You can already pip install it.
...
More here if you want the gory details: archspec paper

Thanks @tgamblin. I finally read the whole paper - looks like amazing work. I'll take any questions/ideas elsewhere to not derail this issue; it certainly seems interesting for us though, and I would like to explore if/how we can make use of it for binaries of NumPy et al.

@webknjaz
Copy link
Member

After pyproject.toml: If scipy uses requires = ["numpy"], then you get a forced upgrade of numpy and all the other issues described above, but it does work. Not so great

FTR one workaround that hasn't been mentioned in the thread is supplying a constraints file set via the PIP_CONSTRAINT environment variable. This does work for pinning the build deps and is probably the only way to influence the build env for the end user, as of today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PEP implementation Involves some PEP state: needs discussion This needs some more discussion type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests