Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrating from distlib.wheel: packaging.utils.parse_wheel_filename allows illegal platform tags #577

Open
asottile-sentry opened this issue Jul 18, 2022 · 6 comments

Comments

@asottile-sentry
Copy link

I'm looking to migrate a caller from utilizing distlib.wheel to instead use packaging.utils.parse_wheel_filename and my tests encountered an edge case which isn't handled by the packaging version

here's the distlib code: https://github.com/pypa/distlib/blob/91aa92e64ee5d5005901907eaa340a72f18d7212/distlib/wheel.py#L83

packaging's parse_tag doesn't validate the interpreter in any way -- whereas distlib required a particular structure

the test case in question: https://github.com/chriskuehl/dumb-pypi/blob/b3fc62c6519bdb84960d2b491b40ad065fb79a47/tests/main_test.py#L61

playlyfe-0.1.1-2.7.6-none-any.whl should not parse as a valid wheel, however packaging allows it (and parses as three separate tag-triplets: 2-none-any, 7-none-any, 6-none-any)

I propose adding a little bit of validation to parse_tag to check that the tags are supported interpreter versions

I believe this involves something like:

_INTERPRETER_RE = re.compile(fr'^(?:{"|".join(sorted(INTERPRETER_SHORT_NAMES.values()))}\d+')

and then utilizing that regex to validate the names (and throwing an appropriate exception) -- I think a new InvalidTag (extending ValueError) or somesuch exception would need to be created as well (similar to the ones in packaging.utils for the wheel / stdist filename)

wanted to propose this first before diving in and implementing it to save time -- thoughts?

@ptmcg
Copy link

ptmcg commented Jul 18, 2022

Missing a trailing ')' in your re, I assume after the closing "}".

_INTERPRETER_RE = re.compile(fr'^(?:{"|".join(sorted(INTERPRETER_SHORT_NAMES.values()))})\d+')

If you are building an re of values from some other source, sorting by descending length will be more likely to give you a unique match, in the case where some short value is a prefix of some longer value.

...sorted(INTERPRETER_SHORT_NAMES.values(), key=len, reverse=True)...

Though probably not an issue in this small group of values (after looking them up in the code).

@brettcannon
Copy link
Member

Unfortunately the best we could do based on my reading of the spec (obviously let me know if I missed something) is test whether the name is normalized as required by https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode . Otherwise the interpreter short names are not guaranteed to be the only valid interpreter names, just convenient ones (e.g. Pyston is not covered by that list). We can't make any assumptions as to what interpreter name is considered valid, and so 2.7.6 as a compressed tag of silly interpreter names is still valid.

I can totally see having a higher-level function in another library that requires expected interpreter tags, but as such a low-level library I don't think we can be that pragmatic about what is usually expected and have to stick to the spec in this instance.

@asottile-sentry
Copy link
Author

PEP 425 adds more color: https://peps.python.org/pep-0425/#python-tag -- that it should be implementation name and then version (or version with an underscore) -- so I believe \w+[0-9][0-9_]* would be a required pattern for a tag

@asottile
Copy link
Contributor

@brettcannon should this be reconsidered given above?

@brettcannon
Copy link
Member

I'm okay with re-opening it, but whether this is a feature request or a bug I'm still torn about. For instance, while PEP 425 says "The version is py_version_nodot," that's entirely CPython-specific as to what that means. As such, do we get to say that it must follow https://peps.python.org/pep-0440/#version-scheme ? What about trailing underscores like your regex allows?

As well, "Other Python implementations should use sys.implementation.name" puts no actual restriction on the format of the name, i.e. there's nothing stopping it from starting with a number. Since the Python interpreter tag is not designed to be parsed but to be an opaque string for code to compare against while still being human-readable makes all of this rather murky.

@pradyunsg
Copy link
Member

"Other Python implementations should use sys.implementation.name" puts no actual restriction on the format of the name

It actually does! From https://peps.python.org/pep-0421/:

name
A lower-case identifier representing the implementation. Examples include ‘pypy’, ‘jython’, ‘ironpython’, and ‘cpython’.

Based on the discussion so far, I reckon requiring it to either be a f"{shortname}{numbers}" for a shortname we recognise, or an identifier is a reasonable restriction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants