Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-grained incremental step does lots of spurious stat() calls #5747

Open
gvanrossum opened this issue Oct 7, 2018 · 3 comments
Open

Fine-grained incremental step does lots of spurious stat() calls #5747

gvanrossum opened this issue Oct 7, 2018 · 3 comments

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Oct 7, 2018

While working on #5745 I instrumented mypy/fscache.py and discovered that a lot of spurious os.stat() calls are made in fine-grained-incremental runs on behalf of find_module() (in mypy/modulefinder.py). This is invoked from is_module() (in mypy/build.py) whose only call site is in all_imported_modules_in_file(). This is used by compute_dependencies() to disambiguate from X import Y -- it needs to know whether Y is a submodule of X or just some object (like a function or class) defined in X. This in turn happens during an incremental call to load_graph() made from update_module_isolated() (in mypy/server/update.py), which is intended to recreate the mypy.build.State object for a specific module that is known to be changed in a run.

For example, the line from typing import Dict ends up calling stat() for the following files in my setup:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/typing-stubs
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/typing/py.typed
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/typing/Dict/py.typed
/Library/Python/3.6/site-packages/typing-stubs
/Library/Python/3.6/site-packages/typing/py.typed
/Library/Python/3.6/site-packages/typing/Dict/py.typed
/Users/guido/Library/Python/3.6/lib/python/site-packages/typing-stubs
/Users/guido/Library/Python/3.6/lib/python/site-packages/typing/py.typed
/Users/guido/Library/Python/3.6/lib/python/site-packages/typing/Dict/py.typed
typing
/Users/guido/src/mypy/typing
/Users/guido/src/mypy/mypy/typeshed/stdlib/3.6/typing
/Users/guido/src/mypy/mypy/typeshed/stdlib/3.5/typing
/Users/guido/src/mypy/mypy/typeshed/third_party/3.5/typing
/Users/guido/src/mypy/mypy/typeshed/stdlib/3/typing
/Users/guido/src/mypy/mypy/typeshed/third_party/3/typing
/Users/guido/src/mypy/mypy/typeshed/stdlib/2and3/typing
/Users/guido/src/mypy/mypy/typeshed/third_party/2and3/typing
/usr/local/lib/mypy/typing

Note that all these stat() calls are cached by fscache.py, but the cache is flushed at the end of each incremental step, so each incremental step does each of these once. (Also note that the current directory seems to appear twice on the search path, once as '', once as its absolute path.)

I came up with a tentative fix, but it needs work (to account for typeshed) and since this is pre-existing behavior I decided to separate it from #5745.

It's not the end of the world, obviously, but I think this ends up doing ~20 stat() calls for each from import (though only in modified files), and since we're looking for a strategy to allow using Watchman instead of calling stat() for each source file, I think we might want to do something about this. (I also happen to know that from import is hugely popular in the Dropbox code base.) I'll assume this is low priority until I have determined how many stat() calls this does for the typical real Dropbox use case.

@ilevkivskyi
Copy link
Member

but it needs work (to account for typeshed)

I am not sure what do you mean by this. Any changes in typeshed files are currently ignored by the daemon (they are not propagated). This is a conscious decision, you can find several places where triggers are ignored if they are from typeshed. @JukkaL can probably comment on this (IIRC he originally proposed this as an optimisation).

@gvanrossum
Copy link
Member Author

I know, but I didn't feel comfortable to decide that hence it would be okay for is_module() to be wrong for typeshed.

Let's say the source has from os import path. This causes the inquiry is_module('os.path') to be made, and the implementation I had would incorrectly return False. There may well be a reason at a higher level in the code why this doesn't matter, but I didn't want to make assumptions about that.

(A lot of the code I tried to understand for #5745 looks rather hacky, and I worry that our technical debt related to the daemon is getting out of hand. I didn't want to add to this.)

@ilevkivskyi
Copy link
Member

OK, I see, then I wouldn't worry too much about several extra stat()s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants