Description
This is a meta-issue, filed to track multiple independent problems and potential solutions to Warehouse's handling of distribution filenames (i.e., sdist and wheel filenames). I'm going to attempt to index all of them, but I'll almost certainly miss one or more.
Background material
Key PEPs and PyPA standards:
- PEP 427 defines the wheel distribution format, including the wheel filename format. PEP 427 is unfortunately internally inconsistent about distribution name normalization, as mentioned in this comment.
- PEP 625 is the most recent sdist filename PEP. It punts to PEP 427 for distribution name normalization, meaning that it carries some of the same ambiguity.
- PyPA's Binary Distribution Format Spec is the living standard copy of PEP 427. It eliminates the ambiguity in the original PEP, making it clear that the normalization only applies to the distribution name and is strictly equivalent to PEP 503 normalization, followed by replacing
-
with_
.
Key discussions:
- Amending PEP 427 (and PEP 625) on package normalization rules contains an extensive discussion on amending PEP 427 and PEP 625, but hasn't seen active conversation in a couple of months.
Outstanding issues and PRs:
- PyPI does not accept wheel file name with
.
replaced with_
#10030: Warehouse does not currently except distribution filenames that have been normalized from.
to_
. - Use PEP 503 rules to validate upload filename #10072: @uranusjr's fix for the above, deferred due to a lack of specification clarity.
Outstanding issues
Warehouse does not support normalized namespace package names
Per both the discuss thread and #10030: namespace packages are commonly denoted as package.foo
, which gets normalized to package-foo
(PEP 503) and package_foo
(wheel-style distribution name).
As such, Warehouse should accept wheels and sdists that start with package_foo
for the package.foo
package. But it currently doesn't, and complains about a mismatched prefix instead.
The relevant code:
warehouse/warehouse/forklift/legacy.py
Lines 1133 to 1140 in 68d1216
Warehouse accepts invalid wheel filenames
Separately, Warehouse's current wheel filename validation is probably overly permissive.
This happens in a few different places:
-
_is_valid_dist_file
fails open rather than closed. In particular, anything that ends with.whl
and contains aWHEEL
file is treated as valid, even if it does not have all of the PyPA/PEP 427 required filename components. -
Extended wheel filename validation uses a regular expression, but doesn't actually check all parts of the resulting match:
warehouse/warehouse/forklift/legacy.py
Lines 1265 to 1275 in 68d1216
In particular, the
build
,pyver
, andabi
components are never checked, meaning that they might be missing entirely.As a result, there is at least one invalid wheel filename (
pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl
) already present on PyPI, with correspondingly invalid metadata available via the JSON API (note the incorrectpython_version
field):{ "comment_text": "", "digests": { "md5": "d8a9fddd534dc56bfad1343c0f4d0cec", "sha256": "962c2d87ee264cfedace8cd1186efe6d898095b74783e9bdba356d15ccd91f64" }, "downloads": -1, "filename": "pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl", "has_sig": false, "md5_digest": "d8a9fddd534dc56bfad1343c0f4d0cec", "packagetype": "bdist_wheel", "python_version": "2.0.5", "requires_python": null, "size": 11052093, "upload_time": "2021-05-15T16:29:06", "upload_time_iso_8601": "2021-05-15T16:29:06.646801Z", "url": "https://files.pythonhosted.org/packages/56/2c/e25e4322c12a75e9f478106b8919c0a011b28edb32171fa21ebe14513022/pyffmpeg-2.0.5-cp35.cp36.cp37.cp38.cp39-macosx_10_14_x86_64.whl", "yanked": false, "yanked_reason": null },