-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for improving typeshed testing and tooling #8774
Comments
Mypy improvements are also welcome! Anyone can ping me to get help: suggestions / reviews / etc. |
Unpinned temporarily to make space for #8956. |
Thank you @AlexWaygood for the list of tools, I’ll have to take a closer look at some of them. I was actually about to open an new issue, but after reading through your writeup I decided to comment on this one instead. Idea for a new toolMy suggestion/question is related to the Allow package references as version specifiers over at Python’s discussion board. I noticed that a types package here doesn’t necessarily have any version-enforced relationship with its code package. For example, it is perfectly fine to have this installed
This looks problematic and a checker that warns against such version mismatch would be useful. A proper solution, methinks, would be to create a dependency between the |
Thanks for the suggestion! Agreed that this has been a bit murky (although the murkiness has often worked out favourably for users). To that end, we've recently done a couple things to make versioning between I would be okay adding upstreams as an extra to |
Thanks @hauntsaninja, glad to know that this issue is being at least thought about. Perhaps I should link this comment to the Python discussion or would you like to leave a note there? Just to make sure we cross-pollinate on this issue.
Using what name? Then users would have to specify, e.g. dependencies = [
"protobuf ==3.20.3",
"types-protobuf[protobuf] ==3.20.3",
] or some such?
Mind to elaborate on the reason? |
Wherever possible, stubs packages should be extremely minimal packages that are easy to install, have minimal dependencies, contain no executable code, contain minimal security risks, and are easy to bootstrap. We've started to relax this policy recently with recent moves working towards allowing non-types dependencies for stubs in some situations (this work is not yet complete). But it's still a principle we should stick to wherever possible. |
Here's an idea, because it happened to me again: Or I can wait for mypy to support the |
Yeah, I'm tempted to say it may not be worth it to write a check for this, given that 3.7 EOL is on the horizon :) But if the implementation isn't too difficult, it might be interesting -- writing the check might even reveal a few existing bugs in typeshed! (Note: I'm not sure exactly when we'll be dropping support for Python 3.7. We haven't really discussed it yet. We waited a few months after Python 3.6 had reached EOL status before dropping support for 3.6, due to the fact that it was still quite widely used when it reached EOL status, and we wanted to give people more time to move over.) |
There aren't that many parameter annotations that start with a single underscore. We could check the existing ones manually.
|
any appetite for looking at using Ruff for linting? |
I like
As far as I know, there are no unique checks in |
Ruff seems cool in many ways, but we definitely can't switch to ruff at typeshed. Flake8-pyi is by far the most important flake8 plugin for typeshed, and it's an important part of our CI. Ruff has only implemented a few of the rules from flake8-pyi, and the rules it has implemented from flake8-pyi have several not-so-great behaviour differences from the way flake8-pyi has implemented them. Also, flake8-pyi is currently maintained by the same team that maintains typeshed, which makes it very easy for us to make tweaks to flake8-pyi as and when typeshed requires it. Ruff seems very open to PRs, which is great, but there's also the fact that I don't know rust, so I wouldn't be able to make PRs fixing bugs in their logic for the flake8-pyi rules they've implemented. |
Since the discussion has mostly died down, I've unpinned this. |
A few days ago, I had a great conversation with @kkirsche offline about some ways we can improve our communications with contributors. One of the things that came up is that it would be good to have a clear outline of specific areas to do with tooling and testing where PRs would be welcomed. With that in mind, here's some areas where I think our testing and tooling could be improved, and where I would welcome contributions.
For any of these tools/ideas: if a PR would be enough work that it'd feel wasteful if it was rejected, please make sure to open an issue discussing the specific change first. If an issue gets a few thumbs-ups, it's a pretty good sign we like the idea, and you should feel free to proceed :)
(Other maintainers: please feel free to edit this!)
Existing tools
Improving existing tools is the "low-hanging fruit": it's always going to be easier to build on existing infrastructure than building something new entirely from scratch.
For each of these tools, I've tried to list a "primary maintainer": this is the person who's most likely to make decisions on a certain tool. That doesn't mean that this person is the only person who's ever going to chime in on a certain tool. For example, @hauntsaninja is the person with the most expertise on stubtest, and people are probably always going to defer to him with regards to that script. But that doesn't mean that other people won't also be interested in reviewing -- @sobolevn and I also have experience with writing stubtest improvements, and may well chime in on a stubtest PR.
Stubtest
Primary maintainer: @hauntsaninja
Stubtest compares what exists at runtime to what we've got in the stub, and checks for inconsistencies between the two. It's one of the most useful tools in our CI, and also really easy to contribute to. If you make a PR for a stubtest improvement, it'll probably be reviewed by some combination of @hauntsaninja, @JelleZijlstra, @sobolevn and/or me within a few days :)
Stubtest improvements generally fall into two categories:
--ignore-missing-stubs
toFalse
for more third-party packages.What to improve:
Stubgen
Primary maintainer: ?
Stubgen is a tool for autogenerating stub files from source code. It's not a code base I'm particularly familiar with, and I'm not sure it's quite so actively maintained as some of the other tools on this list -- but here's a few ways I think it could be improved:
Fewer re-exports (maybe we can already do this with the(Fixed by Use stubgen's--export-less
option? Maybe we just need to updatescripts/create_baseline_stubs.py
, typeshed's wrapper script for running stubgen?) By default, stubgen almost always assumes that an import from somewhere else in the package should be treated as an explicit re-export. Unless the file is an__init__.py
file, however, I think stubgen should actually generally assume the opposite. This comes up a lot in typeshed code reviews; fixing this would save a lot of time.--export-less
option increate_baseline_stubs.py
#8843.)__bool__
methods are generally going to returnbool
" are going to be correct 99% of the time, and will save time for typeshed reviewers.flake8-pyi
Primary maintainer: @AlexWaygood
flake8-pyi is a flake8 plugin to check stub files that works through analysing the Abstract Syntax Tree of the stub. It's a tool that's a lot more "dumb" than stubtest, since it doesn't have the full power of a type checker behind it, and doesn't know anything about the runtime either. There's still a lot you can do just with simple AST analysis, however, and it's another code base that's easy to contribute to if you're familiar with the basics of ASTs and the visitor pattern.
Lots of recent checks we've introduced have been fairly style-focused, but we've also added some bugbear-esque checks in the past that are aimed at catching common bugs:
__all__
PyCQA/flake8-pyi#176(Async)Iterable
from__(a)iter__
methods PyCQA/flake8-pyi#248We should continue to think creatively about whether there are other common bugs that we can catch using a simple AST analysis.
flake8-pyi is maintained by the typeshed team, so if you have an idea for a new check, just file an issue! If it gets a few thumbs-up, it's probably a good idea, and you should feel free to make a PR. If we're sceptical, we'll probably say so :)
There's also a bunch of open issues already with ideas for new checks. If it has a few thumbs-up (or even if it's just older than a week or two and hasn't been closed yet), then it's probably an idea that we're receptive to!
mypy_primer
Primary maintainer: @hauntsaninja
Mypy_primer checks out a copy of typeshed prior to a PR and after a PR. It uses "old typeshed" and "new typeshed" to run mypy against millions of lines of open-source code, and reports the difference between the two.
Here's some ways I think we could make mypy_primer even better:
Test cases
Primary maintainer: @AlexWaygood
Our
test_cases
directory (aimed at providing regression tests for places where stubs have caused problems in the past) is new, and still somewhat experimental. The biggest outstanding issue I see was pointed out by @Akuli in #8066 (comment), which is that the test cases are liable to slip out of date with the upstream source code in much the same way that stubs are. How can we ensure that that doesn't happen? Can we add some kind of verification that ensures that the places where we're asserting type checkers shouldn't emit errors do actually pass at runtime?Stubsabot
Primary maintainer: @hauntsaninja
Stubsabot -- a bot which automatically suggests bumping pinned versions for third-party stubs -- is another new addition to our tooling. Here's some ideas of how we could make it even better:
Fixing stubsabot obsoletion PRs: erroneous assumptions that the latest release is the one that added apy.typed
#8754Fixing Stubsabot isn't updating open stubsabot PRs #8778create_baseline_stubs.py
on any new files that have been added between releases?The "stubtest failed" bot
Primary maintainer: @hauntsaninja
Note: future status of this tool is TBD. typeshed-internal/stub_uploader#60 means that we can now pin stubs to more specific versions, which should hopefully reduce the number of daily stubtest failures we get. Maybe we'll be able to do away with the check altogether...
Stubtest is run on the full suite of third-party stubs every night, and a bot automatically opens an issue if stubtest failed on any of them (e.g. #8745). Things that would be great to see improved about this workflow are:
Miscellany
These are both pretty minor nits, but it would be great to:
pyrightconfig.stricter.json
stay alphabetised, so I don't have to keep annoying contributors that break the alphabetisation ;)tests/check_consistent.py
would be a good place for such a check. (Note that most json-parsing tools will refuse to parsepyrightconfig.stricter.json
because it's strictly-speaking invalid json.)if sys.version_info >= (3, 7)
guards anywhere. (3.7 is the minimum supported typeshed version, so such checks are redundant.)tests/check_new_syntax.py
might be a good place for such a check.Ideas for new tools
These ideas are necessarily going to be more work than the ideas above, as most of them involve writing new scripts from scratch, which would create its own maintenance burden. But here's some pie-in-the-sky ideas for brand new tests/tools we could add.
Note that new tools don't need to be part of typeshed or to run in CI to make a difference. For instance, for a long time, we didn't run stubtest in CI, but during that time we used stubtest to fix many, many errors in typeshed. Keeping a tool separate is a good way to avoid disagreement over things like the balance of false positives vs false negatives, to iterate quickly on the tool, and to prove the tool's value.
TypedDict
fromtyping
(for example) -- which doesn't work, since it needs to be imported fromtyping_extensions
in order to be compatible with Python 3.7. Is there an automated way of fixing that? (Canisort
already do that, if we have the right settings?)--check-untyped-defs
option?The text was updated successfully, but these errors were encountered: