-
Notifications
You must be signed in to change notification settings - Fork 225
First draft of perennial manylinux PEP #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c115d82
81624bd
5c6a351
76c6669
040ce4e
5367cb7
bd1691e
a30657a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
PEP: XXX | ||
Title: Future 'manylinux' Platform Tags for Portable Linux Built Distributions | ||
Version: $Revision$ | ||
Last-Modified: $Date$ | ||
Author: Nathaniel J. Smith <njs@pobox.com> | ||
Thomas Kluyver <thomas@kluyver.me.uk> | ||
BDFL-Delegate: Paul Moore <p.f.moore@gmail.com> | ||
Discussions-To: Discourse https://discuss.python.org/t/the-next-manylinux-specification/1043 | ||
Status: Draft | ||
Type: Informational | ||
Content-Type: text/x-rst | ||
Created: 3-May-2019 | ||
Post-History: 3-May-2019 | ||
|
||
Abstract | ||
======== | ||
|
||
This PEP proposes a scheme for new 'manylinux' distribution tags to be defined | ||
without requiring a PEP for every specific tag. The naming scheme is based on | ||
glibc versions, with profiles in the auditwheel tool defining what other | ||
external libraries and symbols a compatible wheel may link against. | ||
|
||
While there is interest in defining tags for non-glibc Linux platforms, | ||
this PEP does not attempt to address that. | ||
|
||
Rationale | ||
========= | ||
|
||
Distributing compiled code for Linux is more complicated than for other popular | ||
operating systems, because the Linux kernel and the libraries which typically | ||
accompany it are built in different configurations and combinations for different | ||
distributions. However, there are certain core libraries which can be expected in | ||
many common distributions and are reliably backwards compatible, so binaries | ||
built with older versions of these libraries will work with newer versions. | ||
:pep:`513` describes these ideas in much more detail. | ||
|
||
The ``manylinux1`` (:pep:`513`) and ``manylinux2010`` (:pep:`571`) tags make | ||
use of these features. They define a set of core libraries and symbol versions | ||
which wheels may expect on the system, based on CentOS 5 and 6 respectively. | ||
Typically, packages are built in Docker containers based on these old CentOS | ||
versions, and then the ``auditwheel`` tool is used to check them and bundle any | ||
other linked libraries into the wheel. | ||
|
||
If we were to define a ``manylinux2014`` tag based on CentOS 7, there would be | ||
five steps involved to make it practically useful: | ||
|
||
1. Write a PEP | ||
2. Prepare docker images based on CentOS 7. | ||
3. Add the definition to auditwheel | ||
4. Allow uploads with the new tag on PyPI | ||
5. Add code to pip to recognise the new tag and check if the platform is | ||
compatible | ||
|
||
Although preparing the docker images and updating auditwheel take more work, | ||
these parts can be used as soon as that work is complete. The changes to pip | ||
are more straightforward, but not all users will promptly install a new version | ||
of pip, so distributors are concerned about moving to a new tag too quickly. | ||
|
||
This PEP aims to remove the need for steps 1 and 5 above, so new manylinux tags | ||
can be adopted more easily. | ||
|
||
Naming | ||
====== | ||
|
||
Tags using the new scheme will look like:: | ||
|
||
manylinux_glibc_2_17_x86_64 | ||
|
||
Where ``2_17`` is the major and minor version of glibc. I.e. for this example, | ||
the platform must have glibc 2.17 or newer. Installer tools should be prepared | ||
to handle any numeric values here, but building and publishing wheels to PyPI | ||
will probably be constrained to specific profiles defined by auditwheel. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the bikeshedding part... instead of
Since this is bikeshedding I'm not going to get into a battle-of-wills over it, but I wanted to lay out the arguments at least once. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess the counterargument is that Leaving out the I don't have strong feelings over this, and I'm assuming you've thought more about this whole space than I have. But maybe if it needs further discussion we should take that somewhere where more people will participate. (There's nothing like more people for a naming discussion!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's awkward right now, but in a few years
In theory it's possible that the glibc devs would decide to fork off a new project, call it glibc 3, and that all the distros would decide to switch to it. But they've been maintaining ABI compatibility for 22 years now; presumably if the ABI had a fundamental unfixable problem they'd have noticed by now. So it's kind of like preparing for a meteorite strike: sure, it could happen, but... something else will get you first :-). I think it's at least as plausible that all the distros will switch to musl, or that Linux will get replaced by Fuchsia, or x86-64 will get replaced by RISC-V, than that glibc 3 will appear and take over the world. And if it did it would still take like a decade to transition, so we'd have plenty of time to figure out what to do. If we want to be really thorough with our due-diligence we could contact the glibc devs and ask them to explicitly confirm that glibc 3 is never going to happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a part-time glibc developer, I can tell you that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zackw Interesting. The challenge with a 3.0 source version is that we basically assume that can check the "ABI compatibility level" of glibc by calling (And note that this is orthogonal to the "perennial" part of this proposal – a hypothetical glibc 3.0 release would also break manylinux1, manylinux2010, manylinux2014, etc.) Anyway, in addition to breaking on a hypothetical future glibc 3.0, the current string-parsing hack is kinda gross, and means the glibc devs have API surface area that they don't know about. Maybe this means that they can never use 3.0 as a release name, because it would break all existing manylinux wheels. It's awkward all around. I suspect the right solution is for glibc to add a new API, like
That lets us (eventually) drop our gross version parsing regex, makes the actual guarantees more visible to the glibc devs, and potentially provides a path to releasing 3.0 without breaking everyone ( Does that sound sensible to you? If we wanted to bring this to the glibc devs, how would we do that? (And is there any way I can do it without signing up for the glibc dev firehose?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bringing over my preference from the Discourse thread: my view is the opposite of Nathaniel's, and would favour the slightly longer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the discussion in that glibc issue, it sounds like the recommendation is to assume that glibc 3.x might happen, and to assume that the major version bump won't indicate a compatibility break. So I'm changing my recommendation here to Also, apparently, we should be going back and revising the previous manylinux PEPs and pip code to assume that glibc 3 is compatible with glibc 2. I guess the simplest way to handle that would be to make this PEP say the right thing, and then say I don't see any value in sticking There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With that, I'd expect to be able to go look up a spec called "manylinux 2.17". There's no such spec - there's a manylinux 2014 spec that calls out a glibc 2.17 build profile. So if we're going to switch from naming the spec version in the filename to instead naming the build profile, the filename should reflect that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, that's one way we could do things, but... that's not what this PEP is proposing.
The spec you would want is the "manylinux" spec, which seems reasonable. It's exactly the same as how when you see a wheel tagged There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For disambiguation, how about dropping "manylinux", and using |
||
|
||
The existing manylinux tags can also be represented in the new scheme, | ||
for instance: | ||
|
||
- ``manylinux1_x86_64`` becomes ``manylinux_glibc_2_5_x86_64`` | ||
- ``manylinux2010_x86_64`` becomes ``manylinux_glibc_2_12_x86_64`` | ||
|
||
``x86_64`` refers to the CPU architecture, as in previous tags. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the "features" we like in the manylinux 2014 PEP is the expanded platform support. Is that possible to include that here, or should that be a separate proposal? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, looks like multiple architectures are implied below. It may be useful to add the explicit list here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These were only meant to be examples, but I agree that could be clearer. |
||
|
||
While this PEP does not attempt to define tags for non-glibc Linux, the name | ||
glibc is included to leave room for future efforts in that direction. | ||
|
||
Wheel compatibility | ||
=================== | ||
|
||
There are two components to a tag definition: a specification of what makes a | ||
compatible wheel, and of what makes a compatible platform. | ||
|
||
A wheel may never use symbols from a newer version of glibc than that indicated | ||
by its tag. Likewise, a wheel with a glibc tag under this scheme may not be | ||
linked against another libc implementation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not that this is good or bad, just checking. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the time being, we're only trying to specify binary compatibility for platforms based on glibc, because that's what people have done the work to figure out how to make things work. This doesn't forbid users from using another libc implementation, but this spec is not meant to cover distributing pre-built binaries to those users. They would have to build extension modules from source until someone has figured out another spec. The main use case I've heard about for an alternative libc is docker containers based on Alpine Linux, which uses musl. One approach that has been suggested is to standardise alpine-specific tags for that use case. |
||
|
||
As with the previous manylinux tags, wheels will be allowed to link against | ||
a limited set of external libraries and symbols. These will be defined by | ||
profiles documented on https://packaging.python.org/ and implemented in | ||
auditwheel. At least initially, they will likely be similar to | ||
the list for manylinux2010 (:pep:`571`), and based on library versions in | ||
newer versions of CentOS. | ||
takluyver marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The overall goal is to ensure that if a wheel is tagged as | ||
``manylinux_glibc_2_Y``, then users can be reasonably confident that this wheel | ||
will work in any real-world linux-based python environment that uses | ||
auvipy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``glibc 2.Y`` or later and matches the other wheel compatibility tags. | ||
For example, this includes making sure that the wheel only uses symbols that | ||
are available in the oldest supported glibc, and doesn't rely on the system to | ||
provide any libraries that aren't universally available. | ||
|
||
One of the central points of this PEP is to move away from defining each | ||
compatibility profile in its own PEP. | ||
In part, this is to acknowledge that the details of compatibility profiles | ||
evolve over time as the Linux distribution landscape changes and as we learn | ||
more about real-world compatibility pitfalls. | ||
For instance, Fedora 30 `removed <https://github.com/pypa/manylinux/issues/305>`__ | ||
``libcrypt.so.1``, which both ``manylinux1`` and ``manylinux2010`` previously | ||
allowed wheels to externally link. | ||
Auditwheel and the manylinux build images will be updated to avoid new wheels | ||
relying on this as an external library. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't done an exhaustive investigation here, but I want to raise the issue of source and binary compatibility for C++ code right now: this should be explicitly addressed in the PEP. There is a catch-22 here:
This may mean that it is necessary to write the version of the C++ compiler into the name of the wheel, and to say that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh, is that true even if I have two libraries that statically link their own copies of libstdc++ (or that dynamically load libstdc++ into disjoint ELF namespaces), and whose public APIs are entirely C based? Anyway, the way we handle this right now is basically that if you want to make a wheel that targets, say, 2010-era Linux, then you dynamically link to the system This is somewhat restrictive, but less restrictive than you would think. Redhat distributes newer compilers that contain clever linker scripts, that use the system symbols whenever they're available, and for any unavailable symbols it statically links them into the binary. These are the compilers we provide in the manylinux images. And anyway, I don't think we can hope to do any better. For manylinux wheels, we generally assume that they might have to coexist with arbitrary other extensions, built in arbitrary ways. For example, there might be packages that the user compiled themselves using their regular distro compiler, and that link to the system libstdc++. According to your rule, this means that we can never vendor libstdc++ at all; we have to use the system libstdc++. And if we're stuck using the system libstdc++, then the glibc version pretty much tells us what that is, so there's not much point in putting it in the wheel filename. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. The root issue is that each copy of the library assumes it has the only copy of a global data structure (such as: tables of exception handling information, tables of RTTI information, the Some of the problems could be mitigated by ensuring that the Python core interpreter executable and/or
This may be true for CentOS but I know some other distributions (e.g. Debian, Arch) do not update GCC on the same schedule as glibc, so it's not perfect. Documenting a specific gcc and g++ version to be used to build wheels conforming to each rev of manylinux would probably be enough to address the problem, as long as CPython itself continues not to contain any C++ code. However, it was my impression that the idea of perennial manylinux was to avoid having to research and document such things? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, I get that part. What I'm missing is what libstdc++ does to thwart all the stuff that linkers do to try to make this situation work anyway. For example if I make my own Anyway, we don't recommend vendoring libstdc++ so this is mostly academic curiosity, and also in case it gives some hint to the mysterious
That's fine – we don't care about the leading edge, only the trailing edge, and the trailing edge is much more stable and boring. So e.g. consider a Sure, this means that it's actually impossible to know what the definition of manylinux_X is until some time after glibc 2.X has been out in the wild – the standard says that manylinux_X wheels have to be compatible with all distros shipping glibc 2.X, and that's not determined until all those distros have shipped. But that's unavoidable, and not really a problem since our goal is to achieve broad real-world compatibility, not generate wheels that only work on Arch.
No, the goal is to do the same research and documentation, but decouple it from the PEP review cycle and pip release cycle. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The uniqueness requirement applies across the entire address space. ELF namespaces don't do a thing to help; in fact they may make the situation worse, by preventing symbol deduplication from happening. With I don't know specifically what the problem is with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe unrelated, but what do the current PEP's say about manylinux1 and manylinux2010 libraries being loaded at the same time? I am guessing if they both link libstdc++, they may present a similar issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gunan I'm not sure how I managed to miss your comment until now. You're right, this is a problem right now because, according to current policy, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
As with the previous manylinux tags, required libraries which are not on | ||
the whitelist will need to be bundled into the wheel. | ||
|
||
Building compatible wheels | ||
-------------------------- | ||
|
||
For each profile defined on https://packaging.python.org/, we plan to provide | ||
a canonical build environment, such as a Docker image, available for people to | ||
build wheels for that profile. | ||
People can build in other environments, so long as the | ||
resulting wheels can be verified by auditwheel, but the canonical environments | ||
hopefully provide an easy answer for most packages. | ||
|
||
The definition of a new profile may well precede the construction of its | ||
build environment; it's not expected that the definition in auditwheel | ||
is held up until a corresponding environment is ready to use. | ||
|
||
Verification on upload | ||
---------------------- | ||
|
||
In the future, PyPI may begin using auditwheel to automatically validate | ||
uploaded manylinux wheels, and reject wheels that it can't determine are | ||
compliant. If PyPI does this, then it will mean that only wheels that have a | ||
corresponding auditwheel profile can be distributed publicly. | ||
|
||
If you need manylinux support for a platform that currently has no profile | ||
in auditwheel, then you're encouraged to contribute a profile to auditwheel. | ||
If that's not possible for some reason, then other tools can be used, | ||
as long as you try to meet the same goal as auditwheel (i.e., the wheel should | ||
work in all environments with the given glibc version and architecture) – | ||
though you may not be able to upload these wheels to PyPI. | ||
|
||
Platform compatibility | ||
====================== | ||
|
||
The checks for a compatible platform on installation consist of a heuristic | ||
and an optional override. The heuristic is that the platform is compatible if | ||
and only if it has a version of glibc equal to or greater than that indicated | ||
in the tag name. | ||
|
||
The override is defined in an importable ``_manylinux`` module, | ||
the same as already used for manylinux1 and manylinux2010 overrides. | ||
For the new scheme, this module must define a function rather than an | ||
attribute. ``manylinux_glibc_compatible(major, minor)`` takes two integers | ||
for the glibc version number in the tag, and returns True, False or None. | ||
If it is not defined or it returns None, the default heuristic is used. | ||
|
||
The compatibility check could be implemented like this:: | ||
|
||
def is_manylinux_glibc_compatible(major, minor): | ||
# Check for presence of _manylinux module | ||
try: | ||
import _manylinux | ||
f = _manylinux.manylinux_glibc_compatible | ||
except (ImportError, AttributeError): | ||
# Fall through to heuristic check below | ||
pass | ||
else: | ||
compat = f(major, minor) | ||
if compat is not None: | ||
return bool(compat) | ||
|
||
# Check glibc version. | ||
# PEP 513 contains an implementation of this function. | ||
return have_compatible_glibc(major, minor) | ||
|
||
The installer should also check that the platform is Linux and that the | ||
architecture in the tag matches that of the running interpreter. | ||
These checks are not illustrated here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it removes step 4 as well? (We'd change warehouse once to allow all the tags, just like how it allows all versions of macos wheels.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is that Warehouse would need to bump the version of auditwheel it's using to pick up the new profile, so there's still a small change to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, that's kind of orthogonal to this discussion (and anyway they have a bot that handles version bumping), so I'd probably count it? It would remove the need to explicitly write code and write a test. But it's not a big deal either way.