[red-knot] Vendor typeshed's stdlib #11340

AlexWaygood · 2024-05-08T14:45:48Z

Summary

This PR vendors typeshed!

The first commit vendors the stdlib directory from typeshed into a new crates/red_knot/vendored_typeshed directory.
The second commit adjusts various linting config files to make sure that the vendored code is excluded from typo checks, formatting checks, etc.
The LICENSE and README.md files are also vendored, but all other directories and files (stubs, scripts, tests, test_cases, etc.) are excluded. We should have no need for them (except possibly stubs/, discussed in more depth below).
Similar to the way pyright has a commit.txt file in its vendored copy of typeshed, to indicate which typeshed commit the vendored code corresponds to, I've also added a crates/red_knot/vendored_typeshed/source_commit.txt file in the third commit of this PR.

One open question is: should we vendor the stdlib and stubs directories, or just the stdlib directory? The stubs/ directory contains stubs for 162 third-party packages outside the stdlib. Mypy and typeshed_client¹ only vendor the stdlib directory; pyright and pyre vendor both the stdlib and stubs directories; pytype vendors the entire typeshed repo (scripts/, tests/ and all).

In this PR, I've chosen to copy mypy and typeshed_client. Unlike vendoring the stdlib, which is unavoidable if we want to do typechecking of the stdlib, it's not strictly necessary to vendor the stubs directory: each subdirectory in stubs is published to PyPI as a standalone stubs distribution that can be (uv)-pip-installed into a virtual environment. It might be useful for our users if we vendored those stubs anyway, but there are costs as well as benefits to doing so (apart from just the sheer amount of vendored code in the ruff repository), so I'd rather consider it separately.

It will be necessary to add infrastructure to keep our vendored copy of typeshed up to date with the upstream repo. I will add this in a followup PR.

Test Plan

pre-commit run -a

typeshed_client is used by Quora's type checker pyanalyze. ↩

AlexWaygood · 2024-05-08T14:48:14Z

@carljm I'm expecting a line-by-line review

github-actions · 2024-05-08T15:03:41Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

MichaReiser · 2024-05-08T15:06:09Z

@carljm I'm expecting a line-by-line review

What would be helpful is if you could maybe add a few comments of the code that you want to get reviewed.

charliermarsh · 2024-05-08T15:08:30Z

I defer to you all, but I personally would wait to have a PR (with this PR as the base) that leverages these stubs before merging this into main. It ensures that we have everything wired up correctly, and also prevents against us merging these stubs only to have them never be used in the event that we reprioritize or make some other decision.

AlexWaygood · 2024-05-08T15:09:53Z

@carljm I'm expecting a line-by-line review

What would be helpful is if you could maybe add a few comments of the code that you want to get reviewed.

Sorry, that was meant to be a joke :(

I tried to provide a guide to this in the PR description. I don't expect any code in the first commit to be reviewed, as that code is all vendored. I would appreciate a review on the design decisions I outlined in the PR description, and the (very small) configuration changes made in the second and third commit. (One question with the third commit is: did I miss anything? Do any other configuration files need to be updated?)

AlexWaygood · 2024-05-08T15:10:57Z

I defer to you all, but I personally would wait to have a PR (with this PR as the base) that leverages these stubs before merging this into main. It ensures that we have everything wired up correctly, and also prevents against us merging these stubs only to have them never be used in the event that we reprioritize or make some other decision.

We can wait; this isn't urgent. I discussed with Carl on Monday and we thought this might be an easy first step, but we can hold off with this for now, I think.

MichaReiser · 2024-05-08T15:15:22Z

LGTM: I think we should add a READMe on how to update typeshed. For example, I wouldn't have known about the commit.txt file.

AlexWaygood · 2024-05-08T15:29:16Z

LGTM: I think we should add a READMe on how to update typeshed. For example, I wouldn't have known about the commit.txt file.

Sure... Where should I put those docs, do you think? Typeshed's README.md file is one of the few files that I'm vendoring as part of this PR that lies outside the stdlib/ directory. Would you prefer that I replace that with a README.md of our own explaining that this is vendored code and how to update it? Or should I put those docs somewhere else? (crates/red_knot/README.md, maybe?)

As I mentioned in the PR description, I want to add automation to update typeshed on a regular basis as a followup to this PR. For example, mypy gets an automated PR like this every two weeks: python/mypy#17201.

MichaReiser · 2024-05-08T16:00:28Z

crates/red_knot/README.md, maybe?)

Seems fine to me.

zanieb · 2024-05-08T16:05:47Z

Minor nit that I prefer vendor/typeshed to vendored_typeshed

carljm

This looks fine to me. I think the main thing that's missing that I would probably want to have before we merge this is at least documentation, and ideally automation, for updating the snapshot of typeshed.

Personally with that done, I'd just go ahead and merge, otherwise whenever we come back to this it'll probably require another round of updating our own lint configs etc that have changed in the meantime.

There's really no "wiring-up" of it in this PR yet that might be wrong, it's just the files themselves. And I think it's very unlikely we don't use this (and if we don't, it's easy to pull out again.)

charliermarsh · 2024-05-08T17:13:18Z

What if we decide that it needs to go somewhere else, e.g., in its own crate? Are these strong advantages to merging now vs. waiting to understand how this will actually get used and integrated?

AlexWaygood · 2024-05-08T17:35:17Z

Minor nit that I prefer vendor/typeshed to vendored_typeshed

Done

I think the main thing that's missing that I would probably want to have before we merge this is at least documentation, and ideally automation, for updating the snapshot of typeshed.

Added some docs (including instructions for how to update manually) in crates/red_knot/README.md

crates/red_knot/README.md

MichaReiser · 2024-05-09T08:52:49Z

What if we decide that it needs to go somewhere else, e.g., in its own crate? Are these strong advantages to merging now vs. waiting to understand how this will actually get used and integrated?

I think we should just merge it. I don't see any real disadvantages that would justify not merging now and stacked PRs are still a pain, even if it's just marginal:

What if we don't need it: I think that's extremely unlikely and is something that can be fixed easily, just delete the files ;)
What if we need to move the files. I'm sure Alex thought about where to place the files. But even if it turns out that we need to move the files because we learned something new, the fix is easy and a low effort PR.

AlexWaygood · 2024-05-09T11:43:38Z

What if we decide that it needs to go somewhere else, e.g., in its own crate?

It would surprise me quite a lot if code outside of red_knot needed to analyse typeshed's stdlib stubs. I think that would be a sign of bad architecture from us if they did: code from other crates should use high-level APIs provided by red-knot to query the resolved types that red-knot reaches by parsing and analysing the stdlib stubs, rather than analysing the stubs directly from outside red-knot. In any case, I think it will be very easy to move the stubs somewhere else if we decide at a later point in time that it's necessary.

Are these strong advantages to merging now

No strong advantages, no, other than being able to easily do work in followup PRs on the basis that this information is available to us inside the Ruff repository. And I'm also not sure I see any strong disadvantages. I don't think this locks us into having the stubs in one particular place, or precludes us from changing the way we vendor typeshed in the future. I'm happy to consider different ways of doing this (a PyPI package, or using a git submodule) in followups. But for now I think I'd rather land this and move onto followup work.

Skylion007 · 2024-05-09T13:42:22Z

Any reason we didn't vendor it as a git submodule?

zanieb · 2024-05-09T13:44:28Z

@Skylion007 we discussed this, but it's simpler to start without a submodule as it introduces changes to contributors' git workflows.

AlexWaygood · 2024-05-09T13:50:10Z

Any reason we didn't vendor it as a git submodule?

I'm open to it, but not sure it's worth the extra complication. As far as I can tell, the chief advantage seems to be that you'd be able to easily see typeshed's history from inside ruff, which might make it easier to bisect specific issues to specific typeshed commits. But my experience at mypy has been that it's rare that you need to do that -- it's usually pretty easy to figure out what typeshed change is to blame for a specific issue if one arises.

Of the four major Python type checkers currently, only one (pytype) uses a git submodule for their vendored typeshed stubs -- that doesn't mean that we shouldn't do something different, but it does suggest that it's not essential to use a git submodule.

charliermarsh · 2024-05-09T13:53:13Z

Just to respond: I still wouldn't have merged this personally (especially if it were my own change), since (as a rule of thumb) it's unused code without a defined integration point. Maybe the integration is straightforward, but the point is we can't really know until we've taken the next step. Similarly, maybe we change priorities, and then we have a vendored typeshed sitting around.

Regardless, correct for me to defer to others since I'm in the minority. I feel heard, no need to respond!

charliermarsh · 2024-05-09T13:54:19Z

Separately there are some unknowns to me that I'll be interested to watch, but that would not affect merging: How will we ensure that it gets included in the compiled binary? Will these be embedded in the binary, or somehow installed separately on-disk? How will we expose these to the resolver, if they are embedded in the binary? Etc.

This PR vendors typeshed! - The first commit vendors the stdlib directory from typeshed into a new crates/red_knot/vendored_typeshed directory. - The second commit adjusts various linting config files to make sure that the vendored code is excluded from typo checks, formatting checks, etc. - The LICENSE and README.md files are also vendored, but all other directories and files (stubs, scripts, tests, test_cases, etc.) are excluded. We should have no need for them (except possibly stubs/, discussed in more depth below). - Similar to the way pyright has a commit.txt file in its vendored copy of typeshed, to indicate which typeshed commit the vendored code corresponds to, I've also added a crates/red_knot/vendored_typeshed/source_commit.txt file in the third commit of this PR. One open question is: should we vendor the stdlib and stubs directories, or just the stdlib directory? The stubs/ directory contains stubs for 162 third-party packages outside the stdlib. Mypy and typeshed_client1 only vendor the stdlib directory; pyright and pyre vendor both the stdlib and stubs directories; pytype vendors the entire typeshed repo (scripts/, tests/ and all). In this PR, I've chosen to copy mypy and typeshed_client. Unlike vendoring the stdlib, which is unavoidable if we want to do typechecking of the stdlib, it's not strictly necessary to vendor the stubs directory: each subdirectory in stubs is published to PyPI as a standalone stubs distribution that can be (uv)-pip-installed into a virtual environment. It might be useful for our users if we vendored those stubs anyway, but there are costs as well as benefits to doing so (apart from just the sheer amount of vendored code in the ruff repository), so I'd rather consider it separately.

AlexWaygood requested a review from carljm May 8, 2024 14:45

MichaReiser added the internal An internal refactor or improvement label May 8, 2024

carljm approved these changes May 8, 2024

View reviewed changes

AlexWaygood added 3 commits May 8, 2024 17:18

vendor typeshed

fe739bb

exclude vendored files from linting checks

b02367c

add a file giving the source commit

363710b

AlexWaygood force-pushed the vendor-typeshed branch 2 times, most recently from a20aac5 to 93432f7 Compare May 8, 2024 17:11

Update snapshot

9568afa

AlexWaygood force-pushed the vendor-typeshed branch from 93432f7 to b766cce Compare May 8, 2024 17:15

carljm reviewed May 8, 2024

View reviewed changes

crates/red_knot/README.md Show resolved Hide resolved

Add docs on how to update the vendored stubs

b8901ee

AlexWaygood force-pushed the vendor-typeshed branch from b766cce to b8901ee Compare May 8, 2024 17:48

MichaReiser approved these changes May 9, 2024

View reviewed changes

AlexWaygood merged commit c80c171 into astral-sh:main May 9, 2024
19 checks passed

AlexWaygood deleted the vendor-typeshed branch May 9, 2024 11:44

AlexWaygood mentioned this pull request Jun 8, 2024

[red-knot] PEP 561 compliant module resolver #11653

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Vendor typeshed's stdlib #11340

[red-knot] Vendor typeshed's stdlib #11340

AlexWaygood commented May 8, 2024

AlexWaygood commented May 8, 2024

github-actions bot commented May 8, 2024 •

edited

Loading

MichaReiser commented May 8, 2024

charliermarsh commented May 8, 2024

AlexWaygood commented May 8, 2024

AlexWaygood commented May 8, 2024

MichaReiser commented May 8, 2024

AlexWaygood commented May 8, 2024

MichaReiser commented May 8, 2024

zanieb commented May 8, 2024

carljm left a comment

charliermarsh commented May 8, 2024

AlexWaygood commented May 8, 2024 •

edited

Loading

MichaReiser commented May 9, 2024

AlexWaygood commented May 9, 2024

Skylion007 commented May 9, 2024

zanieb commented May 9, 2024 •

edited

Loading

AlexWaygood commented May 9, 2024

charliermarsh commented May 9, 2024

charliermarsh commented May 9, 2024

[red-knot] Vendor typeshed's stdlib #11340

[red-knot] Vendor typeshed's stdlib #11340

Conversation

AlexWaygood commented May 8, 2024

Summary

Test Plan

Footnotes

AlexWaygood commented May 8, 2024

github-actions bot commented May 8, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

MichaReiser commented May 8, 2024

charliermarsh commented May 8, 2024

AlexWaygood commented May 8, 2024

AlexWaygood commented May 8, 2024

MichaReiser commented May 8, 2024

AlexWaygood commented May 8, 2024

MichaReiser commented May 8, 2024

zanieb commented May 8, 2024

carljm left a comment

Choose a reason for hiding this comment

charliermarsh commented May 8, 2024

AlexWaygood commented May 8, 2024 • edited Loading

MichaReiser commented May 9, 2024

AlexWaygood commented May 9, 2024

Skylion007 commented May 9, 2024

zanieb commented May 9, 2024 • edited Loading

AlexWaygood commented May 9, 2024

charliermarsh commented May 9, 2024

charliermarsh commented May 9, 2024

github-actions bot commented May 8, 2024 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood commented May 8, 2024 •

edited

Loading

zanieb commented May 9, 2024 •

edited

Loading