-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add several undocumented flags (--bazel, --package-root and --cache-map) in support of Bazel #4759
Conversation
- Force relative paths. - Force mtimes to zero. - Locate cache files alongside source files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like about the tack I was thinking of lo so many months ago.
mypy/build.py
Outdated
Graph = Dict[str, 'State'] | ||
|
||
|
||
def getmtime(name: str) -> int: | ||
return int(os.path.getmtime(name)) | ||
|
||
|
||
def relpath(path: str) -> str: | ||
if not path or path[0] not in SEPARATORS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Returns the path relative to the current working directory."""
mypy/build.py
Outdated
cache_dir: cache directory | ||
pyversion: Python version (major, minor) | ||
|
||
Returns: | ||
A tuple with the file names to be used for the meta JSON and the | ||
data JSON, respectively. | ||
""" | ||
pair = manager.options.cache_map.get(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels naïvely like it'd be cleaner to always use the cache map in the presence of a cache map at all; I don't know that the hybrid mode of looking in a cache dir and a cache map file would be very meaningful, and it certainly sounds hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might well be the case but I haven't solved what to do about typeshed yet, so without keeping cache_dir as a fallback, typeshed will always read the source.
mypy/main.py
Outdated
@@ -381,6 +382,15 @@ def add_invertible_flag(flag: str, | |||
# --local-partial-types disallows partial types spanning module top level and a function | |||
# (implicitly defined in fine-grained incremental mode) | |||
parser.add_argument('--local-partial-types', action='store_true', help=argparse.SUPPRESS) | |||
# --bazel changes some behaviors for use with Bazel (https://bazel.build). | |||
parser.add_argument('--bazel', action='store_true', help=argparse.SUPPRESS) | |||
# --cache-map FILE ... gives a mapping from source files to cache files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am interested in what happened to the idea of dropping cache files next to source files. I remember choosing the same option you did here, but I forget why, because now-me had the idea that cache files next to source would be cleaner for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to write this up in my overly long status report in #3912 -- Dropbox's build environment puts the output files in a different directory than the source files. It seems outputs (and inputs) must be File
objects, the only way to create those is ctx.actions.declare_file()
, which always puts it it in the output directory. (The File
objects for the sources come from ctx.files.srcs
, but I don't think I can add new ones via that approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct -- you cannot add anything via ctx.files.srcs. You should assume that the source tree is read-only.
If you declare a file, you must create it within the action (Bazel checks this).
mypy/main.py
Outdated
assert n % 3 == 0, "--cache-map requires one or more triples (see source)" | ||
for i in range(0, n, 3): | ||
source, meta_file, data_file = special_opts.cache_map[i:i + 3] | ||
assert source not in options.cache_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In final code probably wants to be something other than an assert
I have to revise this, see #3912 (comment) why. (I can't solve the implicit |
I probably have to go back here and add a new flag, |
I'm following along. I think my version of this a few months ago didn't fake having I also had to do something similar to your suggested |
This is now a bit more battle-tested (see latest commit) and I'm ready for a full-scale review to land it. |
Note that "gvanrossum added some commits 17 days ago" is a lie -- I merged today, fixing two (fortunately simple) conflicts. @msullivan it seems you're "it" for the review of this diff... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, mostly seems fine.
I don't love all the extra logic in the file system cache path, but there doesn't seem to be a much nicer way.
@@ -1272,12 +1287,12 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str], | |||
fine_grained_cache = manager.use_fine_grained_cache() | |||
|
|||
size = st.st_size | |||
if size != meta.size and not fine_grained_cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, such a comment would be useful
mypy/build.py
Outdated
else: | ||
# For Bazel, make all paths relative (else Bazel caching is thwarted). | ||
# And also patch tree.path, which might have been made absolute earlier. | ||
tree.path = path = os.path.relpath(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe factor a bunch of the abspath
/relpath
casing into a function?
Why does the bazel case need to update tree.path.
Also, could we instead just always get the relpath
? Do we need absolute paths outside bazel mode?
# Take a copy to get rid of associated traceback and frame objects. | ||
# Just assigning to __traceback__ doesn't free them. | ||
self.stat_error_cache[path] = copy_os_error(err) | ||
raise err | ||
self.stat_cache[path] = st | ||
return st | ||
|
||
def init_under_package_root(self, path: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some documentation. I'm not sure what's the goal/what's going on here.
mypy/main.py
Outdated
@@ -633,6 +650,57 @@ def add_invertible_flag(flag: str, | |||
report_dir = val | |||
options.report_dirs[report_type] = report_dir | |||
|
|||
# Validate and normalize --package-root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull this logic out to functions?
@@ -4378,3 +4378,14 @@ from d import k | |||
[file t.py.2] | |||
from d import k | |||
# dummy change | |||
|
|||
[case testBazelFlagIgnoresFileChanges] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other --bazel
behaviors we can check in tests without too much pain?
if old_interface_hash == interface_hash: | ||
# If the interface is unchanged, the cached data is guaranteed | ||
# to be equivalent, and we only need to update the metadata. | ||
data_mtime = getmtime(data_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
mypy/build.py
Outdated
@@ -71,6 +71,12 @@ | |||
PYTHON_EXTENSIONS = ['.pyi', '.py'] | |||
|
|||
|
|||
if os.altsep: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code?
mypy/fscache.py
Outdated
raise | ||
dirname, basename = os.path.split(path) | ||
dirname = os.path.normpath(dirname) | ||
if basename == '__init__.py' and dirname in self.fake_package_cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining what is going on and why?
- Remove dead code - Refactor getmtime() and abspath()/relpath() into BuildManager methods - Explain why we set tree.path (always setting it, I think that's fine) - Add docstrings to init_under_package_root() and _fake_init() - Add comments where fake_package_cache is checked - Refactor processing of package_root and cache_map options
I think I've done everything you asked for, and I also merged master.
Please re-review!
|
I'm unclear about the test failures. I can't repro them locally, except once, and after that they passed. So they seem to be flakes, or the result of imperfect test isolation. [UPDATE: Possibly I did a bad merge earlier, I don't see a passing test for this branch after 1f26bdc] |
The test flakes weren't due to a bad merge. In a recent commit I replaced an uncached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looking pretty good now!
For posterity "For files not in the cache map, the --cache-dir is still used." isn't true as of b0319af. Just leaving it here because this cost me some headscratching 😄 |
The
--bazel
flag make cache files hermetic by using relative paths and setting mtime to zero; effectively the presence of a cache file prevents reading of the source file (though the source file must still exist).The
--cache-map
flag specifies a mapping from source files to cache files that overrides the usual way of finding the locations for cache files; e.g.There must be any number of triples (source, meta, data) with the additional constraints that the source must end in
.py
or.pyi
, meta must end in.meta.json
, and data must end in.data.json
. For files in the cache map, the default cache directory (--cache-dir
) is ignored, and if the cache files named in the--cache-map
don't exist they will be written. For files not in the cache map, the--cache-dir
is still used.(NOTE: You still need to pass the source files separately, and the list of
--cache-map
triples must be followed by another flag or by--
, as in the example above.)