-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bazel build integration #3912
Comments
I don't think this is a good idea. The issue sounds more to be with Bazel than Mypy, and I think it makes more sense to modify the build script until Bazel fixes this.
I really, really like this idea. The typeshed information should be immutable therefore this makes a lot of sense. If someone is using a custom typeshed directory, it might get a bit more complicated, but in the general case and for releases, this will be really nice. Mypy currently spends 1.5 seconds on an empty file, so I do hope there are ways to improve startup time. |
Some thoughts below. Note that this our internal team priority is to speed up mypy though a daemon mode instead of a build tool like bazel. Our review bandwidth may be lower than normal until the end of year, so there's a risk that bigger PRs in particular will be slow to review. It's better if we can first reach agreement on the design of new features at an abstract level before looking at PRs, as design reviews are much less work (and can save also implementation work in case the design gets iterated on a lot).
I have a few ideas that may be helpful:
The non-bazel answer is to run mypy as a daemon and to use a small client that starts up very quickly. However, this probably is not a good match for bazel (though I've heard that it's possible). A related idea is to somehow keep a fully initialized mypy process running and just fork it whenever a fresh run is needed. This would provide isolation between mypy runs.
This can't happen in mypy itself, since the cache files depend on mypy options such as platform and target Python version. However, it may make sense to have a build artifact in your bazel setup for typeshed with bundled cache files, and to use it instead of the standard typeshed though
Before adding a large number of command-line options, I'd like to see the output of
Something like this could be pretty useful. Can you create a separate issue for this? This would be useful outside bazel as well. |
Sorry for the slow response! Regarding I'm not so worried about the proliferating number of command line flags -- at least, I don't think you need to pay for it before you can add new flags. I think a I'm not sure what the benefits of your propose
The contents of such a file (IIUC) should be entirely predictable given the values of I'm not sure what |
Yeah, this should be up to the core team, since we are the ones who've added (or accepted) all the existing command-line options :-) I have some vague ideas about improving mypy's startup time and cache deserialization speed, but I'm not yet sure if these are practical and when these will happen. Lazily loading only necessary dependent cache files of indirectly imported modules might be some relatively low-hanging fruit? |
I said earlier:
This may not actually need to be the case. I'm cautiously optimistic. |
any plan to actually do any of this work? btw we've also seen the bazel-python problem around init.py and discussed it a bunch internally. My quick skim of the --namespace-packages proposal makes it sound like it'll solve our problem if we get py2 support for it. Anyhow I hadn't looked into the internals of mypy at all and this proposal seems to cover that pretty well. From a bazel POV though, what we need is a way to invoke mypy in two (new?) modes:
in addition to all the above stuff about making mypy hermetic and care less about existence of init files. |
and to clarify - if mypy has those two modes, I'd be willing to take a stab at writing the skylark to invoke mypy within bazel |
Would you be willing to also take a stab at implementing those two modes?
We're pretty booked up with other perf work already.
|
unfortunately I have no familiarity with the mypy codebase so I'm not
comfortable committing to anything like that. I'm also pretty busy with my
actual day job, but if you convince me it's not too hard then maybe I'd
look into it. But jumping into a new codebase I have no confidence this
would be quick.
…On Wed, Nov 1, 2017 at 7:37 PM Guido van Rossum ***@***.***> wrote:
Would you be willing to also take a stab at implementing those two modes?
We're pretty booked up with other perf work already.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3912 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBnd8Ya4Adc3BJYCzug710MxIBdr0arks5sySrngaJpZM4PLGh4>
.
|
Using the existing JSON serialized cache files for this would be both faster and less work to implement, at the cost of not having an easily human-readable representation. The mypy startup performance was again brought up by @sixolet last week (offline). Here pruning dependencies (either by not loading some dependencies at all or by loading them lazily) would likely be the low-hanging fruit. Other possible optimizations are harder:
|
I made a proof of concept a couple months ago. I still plan to turn it into
real PRs as soon as I dig myself out of all my free time going to baby care
:P
…On Thu, Nov 2, 2017 at 4:58 AM, Jukka Lehtosalo ***@***.***> wrote:
give it a set of python and types files (perhaps .pyi) and generate a new
type file output - so we can implement a build step that summarizes the
types of a library, for it's dependencies to use
Using the existing JSON serialized cache files for this would be both
faster and less work to implement, at the cost of not having an easily
human-readable representation.
The mypy startup performance was again brought up by @sixolet
<https://github.com/sixolet> last week (offline). Here pruning
dependencies (either by not loading some dependencies at all or by loading
them lazily) would likely be the low-hanging fruit. Other possible
optimizations are harder:
- Speed up importing the mypy implementation and dependencies. Pruning
internal mypy dependencies might help a bit, and Python 3.7 might also
improve this a little. Larger improvements will likely require somewhat
drastic measures -- but these are still within the realm of possibility.
- Make deserialization of JSON cache files faster. Again, this is
non-trivial but conceptually it should be possible to make this much faster.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3912 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABjs4Xaygd4iq0Dh_FUt22dVRFbErhx1ks5sya58gaJpZM4PLGh4>
.
|
I am trying to wrap my head around this. I'm still reading up on bazel but I think this is going to be fun! |
A few thoughts from someone who has done something similar ... The mypy cache could be specified as one of the outputs in the mypy aspect's action (and input(s) to other rules). Then Bazel would take care of it for you and re-run only what's necessary. Bazel allows only one rule for creating an output. But your BUILD files probably have some rules that specify the same source files as other rules. There are a few solutions to this, but probably the easiest is to add the rule name to the output file (to make it unique) and have the Skylark create a mapping file that is input to mypy to map between the Bazel label and the actual file. You can see an example of this in pytype/imports_map_loader.py. @gvanrossum - trust me, this quickly becomes not-fun. However, the documentation is much better now than when I did this, and Skylark has far fewer bugs now (but be warned -- it is not Python). |
Here's a brief progress report. Things are not working yet. I have made a mypy branch that supports a I've also created a dropbox-specific Bazel aspect that runs mypy. I don't want to publish this yet since it's gross and doesn't work yet, but I've learned a lot. Perhaps the key learning (besides confirming Peter's observation :-) is that I can't seem to create the mypy cache files alongside the source files, because Bazel doesn't just require that outputs live under the current package. There's a cross-compilation feature that puts all outputs in a different directory, and this takes the form So suppose we have two deps |
I don't follow your discussion entirely, but usually I just make sure that
for my bazel rules, all paths that my tools have to deal with are passed
either in a data file (json or newline delimited or whatever) or as
explicit command line params. E.g. if I was building mypy support, I would
try to pass the paths of all the current files to typecheck to mypy that
way. And definitely I would try to avoid giving the mypy bazel action a
folder instead of an explicit list of files.
At least that's my 2cents.
…On Fri, Mar 16, 2018 at 6:32 PM Guido van Rossum ***@***.***> wrote:
Here's a brief progress report. Things are *not* working yet.
I have made a mypy branch
<https://github.com/gvanrossum/mypy/tree/bazel-flag> that supports a
--bazel flag that does various things for Bazel. For now I've simply
vendored this branch in the Dropbox repo where I'm experimenting with
Bazel. I'm still hacking on the code because I haven't solved all problems
yet.
I've also created a dropbox-specific Bazel aspect that runs mypy. I don't
want to publish this yet since it's gross and doesn't work yet, but I've
learned a lot.
Perhaps the key learning (besides confirming Peter's observation :-) is
that I can't seem to create the mypy cache files alongside the source
files, because Bazel doesn't just require that outputs live under the
current package. There's a cross-compilation feature that puts all outputs
in a different directory, and this takes the form
<root>/<prefix>/<package>/<target>/<file> where <root> is some
horrendously long Bazel directory (fortunately it's also the current
directory :-), <prefix> is something indicative of the target platform
(in our case it's bazel-out/k8-fastbuild/bin -- possibly this is a
Dropbox-specific thing), <package> is the package name (the directory
containing the BUILD file relative to the repo/workspace root), <target>
is the name attribute of the target in the BUILD file, and <file> is
whatever you want. Source files on the other hand live in
<root>/<package>/<file>. I presume this architecture maximizes
parallelism.
So suppose we have two deps <target1> and <target2> whose mypy cache
output files are inputs to <target3>, the cache files are distributed
across three directories, and there's no --cache-dir flag we can pass to
mypy that changes this. I guess Naomi's original idea of having a cache
file mapping is probably the simplest way to handle this. I haven't started
writing code for this solution yet (the branch above was a hopeful attempt
that just helped me debug the situation).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3912 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBndx-7fAD0rc779n6rQttr5KvbkDjjks5tfGeLgaJpZM4PLGh4>
.
|
Bazel has no concept of "folder". |
Well, I solved the cache problem by implementing a Doing this I managed to actually check some targets and their transitive dependencies. (I even found some questionable code.) I made less headway with the |
A similar solution for |
@gvanrossum If you want a review of the Bazel code, I can take a look. (Messiness seems to be a hallmark of Skylark code, so don't feel ashamed of it ... even copious documentation and design notes fail to help in my experience (I had my one of my integration designs reviewed by the Bazel and Skylark team and yet everyone missed a horrible flaw in it.) What do you mean by "similar as for |
Nah, I'm good, I have internal reviewers. |
Alas, it seems my pure Skylark solution for implicit |
I don't see why the |
That was exactly the pattern I was using. But glob uses the current package (i.e. |
I think that I recall having a lot of problems with the details of Python's import model, even when assuming that missing |
You might try shifting the test to the action ... you can get a list of all the files that Bazel thinks are needed, and then perhaps apply some heuristics (because some of the files might be from the source tree and others could be generated in one of the |
…ap) in support of Bazel (#4759) Addresses the mypy part for #3912. (The Skylark code I wrote for that is not yet open source.) The `--bazel` flag make cache files hermetic by using relative paths and setting mtime to zero; effectively the presence of a cache file prevents reading of the source file (though the source file must still exist). The `--cache-map` flag specifies a mapping from source files to cache files that overrides the usual way of finding the locations for cache files; e.g. ``` mypy --cache-map foo.py foo.meta.json foo.data.json bar.py bar.meta.json bar.data.json -- foo.py bar.py ``` The `--package-root` flag specifies a directory below which all subdirectories are considered packages (this accounts for Bazel's habit to imply empty `__init__.py` files everywhere).
Whew! #4759 has finally landed. I am hoping to open-source the Skylark code I wrote too, though I cannot support it (it's got some Dropboxisms). |
Any updates on this project? Do you have any example working build rules? |
There are actually plans to release a bunch of Bazel work done at Dropbox (e.g. Go and Python rules and a tool for auto-generating dependencies), and this would be part of that. We don't have a specific timeline for the release, but we're aiming to do it before the end of 2018. Hope that helps! Rest assured that it is in production use at Dropbox. |
I'm looking to get Python + Bazel + MyPy going in my team, so jumping back in to check if there's been updates and generally what the state of this integration is. |
Sorry, we haven't been able to prioritize the public release of the Skylark files that define our Python targets, so alas, there are no public updates. Internally we still use the |
Understandable. Thanks for the update. |
Jumping back in here to give a heads up that I've started an open-source implementation of this here → https://github.com/thundergolfer/bazel-mypy-integration It's still very much in the pre-Alpha stage, but I've got the very basics happening. I have a lot to do in terms of fixing bugs, smoothing sharp edges, and giving it a proper exercising in a real-world Bazel monorepo, but it's a start! Feedback and advice appreciated 🙏 |
Because of shifts in priorities, it seems unlikely that we'll be able to release our Bazel integration any time soon. Keeping this open as a low-priority issue since there is a lot of potentially useful discussion in case somebody wants to work on this. |
FWIW, I've come up with a way of running source analysis in parallel and caching the results, that gets a good speed-up and doesn't require a tool like Bazel. If there's any interest in this, I can try to write it up -- the code is straightforward but not well documented (I have some documentation if anyone wants to read it; but it needs editing by a good tech writer). |
If the main motivation is performance, remote caching together with mypy-daemon will often give much faster mypy runs than Bazel integration (https://mypy.readthedocs.io/en/stable/additional_features.html#using-a-remote-cache-to-speed-up-mypy-runs). |
(The level of performance obviously depends on various factors, and in some cases Bazel can be a performance win.) |
I have been working on it, and we've been using it at work for a few weeks now with no issues. https://github.com/thundergolfer/bazel-mypy-integration If you see any problems with my implementation please do let me know 🙂. |
I've been playing with getting Mypy to integrate with bazel for typechecking. I've got a basic version of it working. I intend to describe it here for discussion of the general approach, and then post a series of PRs for individual things it needs as I clean them up, if the general approach seems right.
The general approach: Make a
mypy_library
andmypy_binary
macro that puts a typechecking aspect on the python files it contains. The aspect calculates the mypy cache files; give mypy a mode where it stops considering itself responsible for calculating freshness of dependencies, and instead let bazel handle that.Specific PRs I expect to make:
--cache-hermetically
option to stop writing mtimes--cache-mapping
option to specify a json file containing.data.json
and.meta.json
file locations by module id, both for cache files to read and new cache files to write (bazel likes being super specific about things like this; it seems cleaner to specify to mypy than to rely on mypy's cache directory layout). This option would also short-circuit the process of looking for source and checking it for freshness for the covered modules; just believe the cache files bazel is handing you.--module-without-init
option (or some name, I don't know what it should be called) to allow mypy to believe things are modules without an__init__.py
file. Bazel for some reason doesn't care whether you have them in your python source when you build. I can possibly hack around this by touching that file myself in the build directory in the bazel rule, instead.--ignore-module-prefix
option to understand that certain parts of the module path are already accounted for, and all modules within the prefix to ignore should be treated as if they have empty__init__.py
.BUILD
file for mypy.That would be enough to get it working (and I have a prototype of that, which I need to split into separate PRs and clean up a lot if it seems an ok approach). To make it awesomer:
I'm submitting this description before I'm ready to submit PRs deliberately, so I am not super attached to any one approach. Suggestions?
The text was updated successfully, but these errors were encountered: