Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several undocumented flags (--bazel, --package-root and --cache-map) in support of Bazel #4759

Merged
merged 55 commits into from
Jun 5, 2018

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Mar 19, 2018

The --bazel flag make cache files hermetic by using relative paths and setting mtime to zero; effectively the presence of a cache file prevents reading of the source file (though the source file must still exist).

The --cache-map flag specifies a mapping from source files to cache files that overrides the usual way of finding the locations for cache files; e.g.

mypy --cache-map foo.py foo.meta.json foo.data.json bar.py bar.meta.json bar.data.json -- foo.py bar.py

There must be any number of triples (source, meta, data) with the additional constraints that the source must end in .py or .pyi, meta must end in .meta.json, and data must end in .data.json. For files in the cache map, the default cache directory (--cache-dir) is ignored, and if the cache files named in the --cache-map don't exist they will be written. For files not in the cache map, the --cache-dir is still used.

(NOTE: You still need to pass the source files separately, and the list of --cache-map triples must be followed by another flag or by --, as in the example above.)

Copy link
Collaborator

@sixolet sixolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks like about the tack I was thinking of lo so many months ago.

mypy/build.py Outdated
Graph = Dict[str, 'State']


def getmtime(name: str) -> int:
return int(os.path.getmtime(name))


def relpath(path: str) -> str:
if not path or path[0] not in SEPARATORS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""Returns the path relative to the current working directory."""

mypy/build.py Outdated
cache_dir: cache directory
pyversion: Python version (major, minor)

Returns:
A tuple with the file names to be used for the meta JSON and the
data JSON, respectively.
"""
pair = manager.options.cache_map.get(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels naïvely like it'd be cleaner to always use the cache map in the presence of a cache map at all; I don't know that the hybrid mode of looking in a cache dir and a cache map file would be very meaningful, and it certainly sounds hard to debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might well be the case but I haven't solved what to do about typeshed yet, so without keeping cache_dir as a fallback, typeshed will always read the source.

mypy/main.py Outdated
@@ -381,6 +382,15 @@ def add_invertible_flag(flag: str,
# --local-partial-types disallows partial types spanning module top level and a function
# (implicitly defined in fine-grained incremental mode)
parser.add_argument('--local-partial-types', action='store_true', help=argparse.SUPPRESS)
# --bazel changes some behaviors for use with Bazel (https://bazel.build).
parser.add_argument('--bazel', action='store_true', help=argparse.SUPPRESS)
# --cache-map FILE ... gives a mapping from source files to cache files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in what happened to the idea of dropping cache files next to source files. I remember choosing the same option you did here, but I forget why, because now-me had the idea that cache files next to source would be cleaner for some reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to write this up in my overly long status report in #3912 -- Dropbox's build environment puts the output files in a different directory than the source files. It seems outputs (and inputs) must be File objects, the only way to create those is ctx.actions.declare_file(), which always puts it it in the output directory. (The File objects for the sources come from ctx.files.srcs, but I don't think I can add new ones via that approach.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct -- you cannot add anything via ctx.files.srcs. You should assume that the source tree is read-only.

If you declare a file, you must create it within the action (Bazel checks this).

mypy/main.py Outdated
assert n % 3 == 0, "--cache-map requires one or more triples (see source)"
for i in range(0, n, 3):
source, meta_file, data_file = special_opts.cache_map[i:i + 3]
assert source not in options.cache_map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In final code probably wants to be something other than an assert

@gvanrossum
Copy link
Member Author

I have to revise this, see #3912 (comment) why. (I can't solve the implicit __init__.py problem in pure Skylark, I need to support this in mypy.)

@gvanrossum
Copy link
Member Author

I probably have to go back here and add a new flag, --package-roots DIR DIR .... I've finally hit a large enough sample of our codebase to hit a situation where a "source" file is actually generated by another rule/action, and in that case the file exists in a different tree. Fortunately in Skylark I can figure out when this is the case: File objects have a root attribute which is another File object pointing there, and it's empty for genuine sources but points to the output tree for generated sources. So I can write some Skylark that computes the necessary --package-roots arguments. I was somewhat anticipating this with the naming of the arguments in the last diff, but I'll have to switch it from a bool to List[str].

@sixolet
Copy link
Collaborator

sixolet commented Mar 30, 2018

I'm following along. I think my version of this a few months ago didn't fake having __init__.py, it allowed dirs to be treated as packages even in the absence of __init__.py. I think I like your version better.

I also had to do something similar to your suggested --package-roots change, though at the time I thought it was just because of the weird way my particular codebase is organized.

@gvanrossum
Copy link
Member Author

This is now a bit more battle-tested (see latest commit) and I'm ready for a full-scale review to land it.

@gvanrossum
Copy link
Member Author

Note that "gvanrossum added some commits 17 days ago" is a lie -- I merged today, fixing two (fortunately simple) conflicts. @msullivan it seems you're "it" for the review of this diff...

Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, mostly seems fine.

I don't love all the extra logic in the file system cache path, but there doesn't seem to be a much nicer way.

@@ -1272,12 +1287,12 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str],
fine_grained_cache = manager.use_fine_grained_cache()

size = st.st_size
if size != meta.size and not fine_grained_cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, such a comment would be useful

mypy/build.py Outdated
else:
# For Bazel, make all paths relative (else Bazel caching is thwarted).
# And also patch tree.path, which might have been made absolute earlier.
tree.path = path = os.path.relpath(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe factor a bunch of the abspath/relpath casing into a function?

Why does the bazel case need to update tree.path.

Also, could we instead just always get the relpath? Do we need absolute paths outside bazel mode?

# Take a copy to get rid of associated traceback and frame objects.
# Just assigning to __traceback__ doesn't free them.
self.stat_error_cache[path] = copy_os_error(err)
raise err
self.stat_cache[path] = st
return st

def init_under_package_root(self, path: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some documentation. I'm not sure what's the goal/what's going on here.

mypy/main.py Outdated
@@ -633,6 +650,57 @@ def add_invertible_flag(flag: str,
report_dir = val
options.report_dirs[report_type] = report_dir

# Validate and normalize --package-root.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this logic out to functions?

@@ -4378,3 +4378,14 @@ from d import k
[file t.py.2]
from d import k
# dummy change

[case testBazelFlagIgnoresFileChanges]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other --bazel behaviors we can check in tests without too much pain?

if old_interface_hash == interface_hash:
# If the interface is unchanged, the cached data is guaranteed
# to be equivalent, and we only need to update the metadata.
data_mtime = getmtime(data_json)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

mypy/build.py Outdated
@@ -71,6 +71,12 @@
PYTHON_EXTENSIONS = ['.pyi', '.py']


if os.altsep:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code?

mypy/fscache.py Outdated
raise
dirname, basename = os.path.split(path)
dirname = os.path.normpath(dirname)
if basename == '__init__.py' and dirname in self.fake_package_cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining what is going on and why?

Guido van Rossum added 2 commits May 29, 2018 14:53
- Remove dead code
- Refactor getmtime() and abspath()/relpath() into BuildManager methods
- Explain why we set tree.path (always setting it, I think that's fine)
- Add docstrings to init_under_package_root() and _fake_init()
- Add comments where fake_package_cache is checked
- Refactor processing of package_root and cache_map options
@gvanrossum
Copy link
Member Author

gvanrossum commented May 29, 2018 via email

@gvanrossum
Copy link
Member Author

gvanrossum commented Jun 2, 2018

I'm unclear about the test failures. I can't repro them locally, except once, and after that they passed. So they seem to be flakes, or the result of imperfect test isolation.

[UPDATE: Possibly I did a bad merge earlier, I don't see a passing test for this branch after 1f26bdc]

@gvanrossum
Copy link
Member Author

The test flakes weren't due to a bad merge. In a recent commit I replaced an uncached os.path.getmtime() call with a call to fscache.stat() (and extracting the mtime). But that particular getmtime() call apparently needs to be uncached. This also (intuitively) explains why the failures were flakes -- they depended on filesystem state.

Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looking pretty good now!

@gvanrossum gvanrossum merged commit b78b621 into python:master Jun 5, 2018
@gvanrossum gvanrossum deleted the bazel-flag branch June 5, 2018 22:19
@thejcannon
Copy link
Contributor

For posterity "For files not in the cache map, the --cache-dir is still used." isn't true as of b0319af.

Just leaving it here because this cost me some headscratching 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants