Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Implement PEP 625 - File Name of a Source Distribution #3593

Closed
1 task done
pfmoore opened this issue Sep 17, 2022 · 40 comments · Fixed by #4286
Closed
1 task done

[FR] Implement PEP 625 - File Name of a Source Distribution #3593

pfmoore opened this issue Sep 17, 2022 · 40 comments · Fixed by #4286
Labels
enhancement Needs Triage Issues that need to be evaluated for severity and status.

Comments

@pfmoore
Copy link
Member

pfmoore commented Sep 17, 2022

What's the problem this feature will solve?

Conform to accepted standards, make it possible to reliably determine a project's (canonical form) name and version from the source distribution filename.

Describe the solution you'd like

See https://peps.python.org/pep-0625/

When creating sdist files, normalise the project name and version parts according to the specification, documented here.

Alternative Solutions

Continue as at present, which will leave sdist consumers with no reliable way of knowing the filename and version of a sdist short of either extracting the metadata from the sdist (if the sdist conforms to PEP 643) or actually building the distribution.

Additional context

Code that wants the project's formal name will still need to read the distribution metadata - that is understood and this specification doesn't affect that.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@pfmoore pfmoore added enhancement Needs Triage Issues that need to be evaluated for severity and status. labels Sep 17, 2022
@mgorny
Copy link
Contributor

mgorny commented Feb 10, 2023

From Gentoo's standpoint, this will also help us getting predictable sdist names, as right now some PEP517 backends produce normalized filenames and others do not.

@jaraco
Copy link
Member

jaraco commented Apr 12, 2024

In #4302, after releasing v69.3, users are surprised by two behaviors:

  • Trailing zeros are stripped.
  • The filename of the sdist doesn't match other names inside the sdist.

The latter sounds like a bug. The former sounds like a surprising change implied by the spec or the implementation.

Is there better documentation on what constitutes a canonical version number? The spec is pretty silent about the trailing zeros. The packaging.utils.canonicalize_version, however, has two implementations, one which strips the zeros and the other which doesn't, switched by a boolean flag. Which is the real canonical version?

Since users are reporting that the filename is in fact not canonicalizing the version, that also sounds like a problem that wasn't fully addressed in #4286.

@mtelka
Copy link
Contributor

mtelka commented Apr 12, 2024

Also releases at PyPI are with trailing zeros.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 12, 2024

I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary when normalising versions. Yes, when comparing versions, extra trailing zeroes are ignored, but that's not the same as normalising.

I would also expect that the name and version in the sdist and wheel filenames should be the same.

@jaraco
Copy link
Member

jaraco commented Apr 13, 2024

I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary

This section does say

{version} is the canonicalized form of the project version (see Version specifiers).

And that section indicates:

See also Appendix: Parsing version strings with regular expressions which provides a regular expression to check strict conformance with the canonical format

Which leads to a function to check for is_canonical:

import re
def is_canonical(version):
    return re.match(r'^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$', version) is not None

Running that confirms that the spec considers both 1 and 1.0 to both be canonical for the same version:

@ is_canonical('1.0') and is_canonical('1')
True

Therefore, the bug is in packaging, which transforms 1.0 to 1.

@jaraco
Copy link
Member

jaraco commented Apr 13, 2024

What that does imply, however, is that for a given version, it will not be possible to deterministically infer what the filename will be for that version. If the indicated version is 1.0, the filename will have "1.0" and if the indicated version is "1", the filename will have "1". There is in fact no canonical form of a version if arbitrary trailing zeros are allowed as any version could append an arbitrary trailing zero and have a still canonical and conformant but divergent manifestation.

@ is_canonical('2024.4.13.0.0.0.0.0.0.0')
True

@mtelka
Copy link
Contributor

mtelka commented Apr 13, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

Similarly, PyPI shouldn't allow (and I believe it does so) to create project A if there is already project a. And nobody and nothing forces you to use either A or a as a name for your project. So it should be okay to release version X.0 or X as you wish/need.

rouault added a commit to rouault/gdal that referenced this issue Apr 14, 2024
setuptools 65.3 has pypa/setuptools#3593
"Implement PEP 625 - File Name of a Source Distribution" which modifies
the source tarball. Adapt for it
@di
Copy link
Member

di commented Apr 15, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename sampleproject-1.0-2.tar.gz could be for:

  • a project named sampleproject with a canonicalized version of 1.post2
  • a project named sampleproject-1-0 with a canonicalized version of 2.

There are more details in https://peps.python.org/pep-0625/.

Similarly, PyPI shouldn't allow (and I believe it does so) to create project A if there is already project a. And nobody and nothing forces you to use either A or a as a name for your project. So it should be okay to release version X.0 or X as you wish/need.

We allow projects to be created with whatever capitalization they prefer (as well as separators) but the filename is normalized for them as well (i.e. will always be a for a project named A).

Note that this change is only for the filename, which users don't usually see -- the version displayed on PyPI can continue to be the non-canonicalized version, nothing changes there.

@di
Copy link
Member

di commented Apr 15, 2024

I think we need to reopen this, due to df45427 the version is no longer being normalized, which is required per PEP 625:

version is the version of the distribution as defined in PEP 440, e.g. 20.2, and normalised according to the rules in that PEP.

Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.

@mtelka
Copy link
Contributor

mtelka commented Apr 15, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename sampleproject-1.0-2.tar.gz could be for:

* a project named `sampleproject` with a canonicalized version of `1.post2`

* a project named `sampleproject-1-0` with a canonicalized version of `2`.

There are more details in https://peps.python.org/pep-0625/.

PEP 625 says: The name of an sdist should be {distribution}-{version}.tar.gz.

  • distribution is the name of the distribution as defined in PEP 345, and normalised as described in the wheel spec

  • version is the version of the distribution as defined in PEP 440

PEP 440 says: The canonical public version identifiers MUST comply with the following scheme:

[N!]N(.N)*[{a|b|rc}N][.postN][.devN]

This means that sampleproject-1.0-2.tar.gz is not a compliant sdist file name. PEP 625 prohibits production of such sdists.

OTOH, both sampleproject-1.0.tar.gz and sampleproject-1.tar.gz are canonical and valid sdist file names.

Or, do I miss something?

@di
Copy link
Member

di commented Apr 15, 2024

Or, do I miss something?

Yes, I'm talking about normalization of the version in general according to PEP 440 (which was removed in df45427), not just the trailing zeros.

@layday
Copy link
Member

layday commented Apr 22, 2024

I was not able to predict the irrelevance. Apologies for the noise.

@mauritsvanrees
Copy link
Contributor

FWIW, Buildout currently cannot install source distributions with underscores. When a wheel is available, installation still works, at least until the wheel package starts creating normalised distribution names as well. See my issue report at buildout/buildout#647

That needs to be fixed in the Buildout project. I suspect that installation actually works, but that Buildout does not see the new package because it is looking for the wrong name.

@zzzeek
Copy link

zzzeek commented Jun 18, 2024

Hi -

I'm looking as much as I can but I am not seeing where this change dictates that the filename generated by sdist must be all lower case, if that's correct. Observing doing python setup.py sdist with SQLAlchemy under setuptools 69.2.0 yields:

SQLAlchemy-2.0.31.dev0.tar.gz

whereas under 69.3.0 it yields:

sqlalchemy-2.0.31.dev0.tar.gz

I don't care what the casing is personally, however I'm about to do a release on pypi and I'm extremely concerned about automated systems / distribution scripts etc. that will be broken by this change.

@zzzeek
Copy link

zzzeek commented Jun 18, 2024

For example, Fedora's python-sqlalchemy.spec file will break with this change. now they can fix it of course but this is something I expect to see happening all over the place: https://src.fedoraproject.org/rpms/python-sqlalchemy/blob/rawhide/f/python-sqlalchemy.spec

@mgorny
Copy link
Contributor

mgorny commented Jun 18, 2024

I'm looking as much as I can but I am not seeing where this change dictates that the filename generated by sdist must be all lower case, if that's correct.

  1. Source distribution format:

The file name of a sdist was standardised in PEP 625. The file name must be in the form {name}-{version}.tar.gz, where {name} is normalised according to the same rules as for binary distributions (see Binary distribution format)

  1. Binary distribution format:

In distribution names, any run of -_. characters (HYPHEN-MINUS, LOW LINE and FULL STOP) should be replaced with _ (LOW LINE), and uppercase characters should be replaced with corresponding lowercase ones.

(highlight mine)

@ds-cbo
Copy link

ds-cbo commented Jun 19, 2024

@zzzeek

I'm extremely concerned about automated systems / distribution scripts etc. that will be broken by this change
this is something I expect to see happening all over the place

As a package repo maintainer: Your concerns are valid and your expectations are right. But it seems that this is a (one-time, hopefully) price we all have to pay to ensure some form of unified ecosystem across the vast amount of build tools

@zzzeek
Copy link

zzzeek commented Jun 19, 2024

@zzzeek

I'm extremely concerned about automated systems / distribution scripts etc. that will be broken by this change
this is something I expect to see happening all over the place

As a package repo maintainer: Your concerns are valid and your expectations are right. But it seems that this is a (one-time, hopefully) price we all have to pay to ensure some form of unified ecosystem across the vast amount of build tools

May I ask why noone considered adding an option so that large projects that dont wish to flip a switch overnight against a specific setuptools version can at least enable "legacy naming mode" while still using more recent setuptools versions? we've pinned our setuptools to avoid being early on this, we'd prefer to not be one of the first projects that receives a bucket of complaints.

Are there bigger warnings / notices that I missed that would be drawing people's attention to this, and that people might want to be careful when they update setuptools? I only realized this change when my release scripts broke. A change this major IMO needs a lot more up front warnings, there should have been deprecation warnings, etc.

@jaraco
Copy link
Member

jaraco commented Jun 19, 2024

First, sorry for the inconvenience and thanks for your feedback.

When reviewing the original change, I briefly considered the compatibility implications and my assumption (probably wrong) was that most systems were not reliant on the specific characters in the name and where they were, they've already had to deal with the inconsistencies across build backends, so the best thing to do is to move toward the standards an align. For that reason, I didn't consider it a breaking change and so didn't invest a lot of time addressing migration concerns.

at least enable "legacy naming mode" while still using more recent setuptools versions

My feeling here is if you want the legacy naming mode, simply use the legacy version (treat 69.3 as a breaking change). Setuptools already carries an unmanageable amount of debt supporting legacy behaviors (easy_install, package_index, sandbox, vendored dependencies, distutils integration, pkg_resources, and loads of small ones). Unless there's a strong case for having an escape hatch, I'm reluctant to add more complexity and debt.

Moreover, we're aiming to honor and enforce a standard. If Setuptools provides an escape hatch, it allows users to violate the standard and perpetuate the damaging behavior.

Are there bigger warnings / notices that I missed that would be drawing people's attention to this, and that people might want to be careful when they update setuptools? I only realized this change when my release scripts broke. A change this major IMO needs a lot more up front warnings, there should have been deprecation warnings, etc.

There were not. And unfortunately, it's often difficult to surface such concerns in a meaningful and timely way. We probably could have made an announcement or posted on X or even tried emitting warnings when a name was being normalized, but I'm not even sure what the guidance would be. There's not much an affected user could do but disable the preferred behavior.

I can imagine a world in which we took a much more conservative approach:

  • Build backends are notified of an upcoming change and give users a chance to opt in.
  • PyPI puts warnings around sdists with non-compliant names, incentivizing projects to opt in.
  • Opt in is changed to an opt out (breaking change) for each backend.
  • Opt out is removed (breaking change) for each backend.

As you might imagine, this effort would require coordination across multiple projects, a large investment in engineering time, and a large investment over time (while changes soak). Such a large investment is likely infeasible when prioritized against other issues on the volunteer budget we have.

My instinct is that most users get late versions of Setuptools and have already adopted the change. Integrators have already encountered the change and are adapting for it. I'd not expect (at this later stage at least) that SQLAlchemy users will be the first affected.

In retrospect this project could have released the change as a breaking change, but that ship has (mostly) sailed. Should it consider it a breaking change when the version normalization is brought back? I'm thinking no, mainly out of consistency.

@zzzeek
Copy link

zzzeek commented Jun 19, 2024

I'm familiar with the "this would have required lots of complexity and development" angle , I was more getting at 1. recognizing this is a surprising change and 2. having text somewhere that is something more than the name of a PEP buried in the changelog, which when reading the PEP doesnt itself even say anything about casing, someone had to go and point to the real spec on pypa to indicate this. In SQLAlchemy, we do a doc like this: https://docs.sqlalchemy.org/en/20/changelog/whatsnew_20.html which is just, a single document that lists out the big, major "What's going to change noticeably?" elements, phrased in terms that consumers of the library can get to the point quickly without having to familiarize with all the packaging specifications to know that, "the names of your files will change". that's all.

@jaraco
Copy link
Member

jaraco commented Jun 19, 2024

I know it's too little too late, but I've expanded in the changelog to make the user-impacting changes more apparent.

@zzzeek
Copy link

zzzeek commented Jun 19, 2024

Thanks for doing that !

@mauritsvanrees
Copy link
Contributor

Too late, but not too little. ;-)

As I mentioned above, Buildout has trouble installing the sdists under the new names. I have a PR ready fixing that.

I wonder if one part of that PR could be useful to put in setuptools: a mixin class for Environment and PackageIndex. Do not be fooled by the filename this is in (easy_install.py): since version 3 of zc.buildout we call pip under the hood.

The mixin class normalises a package name before it adds it in an Environment/PackageIndex, and normalises it when it gets the list of distributions from there. Point is that otherwise for a package that has old- and new-style sdists, half of them end up under one key, and one under another.

I made a dummy namespace package for that:

  • Version 1.0.0 has sdist mauritstest.namespacepackage-1.0.0.tar.gz
  • Version 1.0.1 has sdist mauritstest_namespacepackage-1.0.1.tar.gz

On the /simple index page you see both filenames, and they both will be added to the PackageIndex or Environment, but under different keys. So in the case of Buildout, if I say I want mauritstest.namespacepackage 1.0.1, it does not find it.

Is that something worth to explore in a PR in setuptools?
Or are these only legacy code parts that should not really be used anymore, because we should be using pip for that (which install both versions just fine)?

@abravalheri
Copy link
Contributor

abravalheri commented Jun 27, 2024

@pfmoore, @di, I suppose https://github.com/pypa/setuptools/pull/4434/files completed the implementation of this feature, right? Is there anything left?

For the context, I am assuming that the sdist name is derived from dist.get_fullname() and that pypa/wheel had already solved this problem (the implementation in wheel is https://github.com/pypa/setuptools/blob/main/setuptools/command/bdist_wheel.py#L295-L304).

@pfmoore
Copy link
Member Author

pfmoore commented Jun 27, 2024

I assume so, but I'm not familiar with the setuptools/wheel codebases, so I wouldn't take my word on it 😉

@di
Copy link
Member

di commented Jun 27, 2024

Yes, I think #4434 resolves this issue.

@jaraco jaraco closed this as completed Jun 28, 2024
clrpackages pushed a commit to clearlinux-pkgs/pypi-setuptools that referenced this issue Jul 10, 2024
…version 70.2.0

Anderson Bravalheri (4):
      Add doctest to capture edge cases of PEP 625
      Use canonicalize_version to produce fullname
      Add news fragment
      Add another test case for version

Avasam (2):
      Use `set` instead of `True`-only `dict`
      Use actual boolean parameters and variables

Bartosz Sławecki (1):
      Move project metadata to `pyproject.toml` (jaraco/skeleton#122)

Christoph Reiter (2):
      CI: run pytest without arguments to avoid stdlib distutils being imported
      CI: explicitely CC/CXX for clang only mingw environments

DWesl (1):
      Port code from CygwinCCompiler to UnixCCompiler

Dimitri Papadopoulos (21):
      Remove extra pairs of quotes from litteral strings
      Use brackets for the default value of option arguments
      Enforce ruff/flake8-implicit-str-concat rule ISC001
      A round of `ruff format` after  `ruff check --fix`
      Enforce ruff/flake8-implicit-str-concat rule ISC003
      Apply ruff rule RUF100
      Apply ruff rule RUF010
      Enable ruff rule RUF010
      Apply ruff/pyupgrade rule UP031
      Round of `ruff format` after `ruff check`
      Enable ruff/pyupgrade rules (UP)
      Apply ruff/flake8-implicit-str-concat rule ISC001
      Apply ruff/flake8-implicit-str-concat rule ISC003
      Enable ruff/flake8-implicit-str-concat rules (ISC)
      Use brackets for the default value of option arguments
      Apply ruff rule RUF100
      Apply ruff/flake8-raise rule RSE102
      Apply ruff/flake8-return rule RET502
      Apply ruff/flake8-return rule RET503
      Apply ruff/Perflint rule PERF401
      Enforce ruff/tryceratops rule TRY300

Dustin Ingram (2):
      Support PEP 625
      Fix canonicalization

Jason R. Coombs (49):
      Expect to find canonicalize_* functions in packaging.
      Update tests to match new expectation.
      In test_sdist, provide a more complex name to capture canonicalization behavior.
      Add packaging as a vendored package.
      Use vendored packaging.
      Revert the canonicalization of the version. Ref pypa/setuptools#3593.
      Revert "Update tests to match new expectation."
      Pin against pytest 8.1.x due to pytest-dev/pytest#12194.
      Allow macos on Python 3.8 to fail as GitHub CI has dropped support.
      Move project.urls to appear in the order that ini2toml generates it. Remove project.scripts.
      Revert "Allow macos on Python 3.8 to fail as GitHub CI has dropped support."
      Rename extras to align with core metadata spec.
      Prefer "Source" to "Homepage" for the repository label.
      Add 'consolidate_linker_args' wrapper to protect the old behavior for now.
      Exclude compat package from coverage.
      Add type declaration for runtime_library_dir_option, making explicit the different return types one might expect.
      Extend the retention of the compatibility.
      👹 Feed the hobgoblins (delint).
      Move compatibility modules into compat package.
      Move compatibility module into compat package.
      Fix return type to match implementation.
      🧎‍♀️ Genuflect to the types.
      Oops. Meant 2025.
      Migrated config to pyproject.toml using jaraco.develop.migrate-config and ini2toml.
      Extract _make_executable for TestSpawn.
      Move and reword comment for brevity and clarity.
      Remove C901 exclusion; code is now compliant.
      Remove apparently unnecessary cast to list.
      Use proper boolean literals.
      Replace Popen with check_call.
      Extract function for _debug wrapper.
      Extract function to inject macos version.
      👹 Feed the hobgoblins (delint).
      Use mkstemp unconditionally. mktemp has been deprecated since Python 2.3.
      Pin to pytest<8.1.
      Deprecate find_executable.
      Apply canonicalize_version with strip_trailing_zero=False.
      Move local ruff rules into a local section.
      Combine strings for clarity.
      Extract method for checking macro definition.
      Extract method for _is_valid_macro.
      Remove unnecessary override to the same value.
      Suppress EncodingWarnings in docutils.
      Replace use of deprecated find_executable with shutil.which.
      Add news fragment
      Remove 'normally supplied to setup()'. Declarative styles are normalized.
      Add a section on interpolation.
      Prefer relative imports for better portability.
      Bump version: 70.1.1 → 70.2.0

Naveen M K (9):
      Add support for building extensions using MinGW compilers
      Fix tests for `get_msvcr` function
      Make `test_customize_compiler` run on mingw
      CI: add msys2 mingw test
      Fix path separator issue in change_root function
      test_install: fix an issue specific to mingw
      Remove testing dependency on jaraco.text
      Add test for dll_libraries attribute in CygwinCCompiler class
      Add some tests for Mingw32CCompiler class

Stephen Brennan (1):
      Use a separate build directory for free-threading

Sviatoslav Sydorenko (3):
      Let codecov-action autodetect the coverage report
      🧪 Unignore errors in `coverage xml` @ Cygwin
      Revert "🧪 Unignore errors in `coverage xml` @ Cygwin"
arnout pushed a commit to buildroot/buildroot that referenced this issue Aug 23, 2024
Release notes: https://docs.djangoproject.com/en/5.1/releases/5.1/

We need to add --skip-dependency-check to build options as django
currently pins setuptools <69.3 [1] and buildroot uses a newer version.

The Django pin is likely to not be affected by PEP-625 [2] handling,
which was added to setuptools 69.3 [3][4]. We don't really care about
the sdist name changing for django though, so we can use a newer version
of setuptools as well.

Django has been confirmed to still install and work correctly by running
the runtime test.

[1] django/django@4686541
[2] https://peps.python.org/pep-0625/
[3] pypa/setuptools#3593
[4] https://github.com/pypa/setuptools/blob/main/NEWS.rst#v6930

Signed-off-by: Marcus Hoffmann <buildroot@bubu1.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
adelton added a commit to adelton/python-libssh that referenced this issue Nov 9, 2024
…_rpm or test using setuptools.

The setuptools 69.3.0 started to change dashes to underscores, leading to
+ /usr/lib/rpm/rpmuncompress -x -v /src/build/bdist.linux-x86_64/rpm/SOURCES/python-libssh-0.0.1.tar.gz
error: File /src/build/bdist.linux-x86_64/rpm/SOURCES/python-libssh-0.0.1.tar.gz: No such file or directory
error: Bad exit status from /var/tmp/rpm-tmp.deAYuh (%prep)

pypa/setuptools#3593

The setuptools 72.0.2 removed setup.py test command, leading to
/usr/lib/python3.13/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'test_suite'
  warnings.warn(msg)
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help
error: invalid command 'test'

pypa/setuptools#4519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

Successfully merging a pull request may close this issue.