-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable mypy in CI 1/2 #5328
Enable mypy in CI 1/2 #5328
Conversation
b3e14ab
to
62ded0f
Compare
The good: all files except scheduler.py are now passing mypy checks. |
8bda7e9
to
3de84a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky!
As this is a relatively large change (though one I think there's interest in), we should be sure to get feedback from folks who regularly contribute to distributed
. cc @fjetter, @jakirkham, @gjoseph92, @mrocklin, @jacobtomlinson, @madsbk thoughts on adding type annotations? (also feel free to rope others into this discussion)
I think this is a great idea, thanks @crusaderky! |
As discussed in the latest coiled meeting, I have reverted all changes to scheduler.py and the pre-commit hook, to publish them in a second iterative PR. |
Generally ok with this. However I do have one concern. In particular I'd like us to be aware that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky, this is awesome
@@ -135,7 +129,7 @@ def remote_magic(line, cell=None): | |||
|
|||
|
|||
# cache clients for re-use in remote magic | |||
remote_magic._clients = {} | |||
remote_magic._clients = {} # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_magic._clients = {} # type: ignore | |
remote_magic._clients: Dict[str, BlockingKernelClient] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed. We should always use lowercase dict[...]
, list[...]
, etc. and put from __future__ import annotations
at the top of the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: can't.
distributed/_ipython_utils.py:133: error: Type cannot be declared in assignment to non-self attribute
distributed/_ipython_utils.py:133: error: "Callable[[Any, Any], Any]" has no attribute "_clients"
@@ -50,7 +50,7 @@ class NotThisMethod(Exception): | |||
"""Exception raised if a method is not valid for the current scenario.""" | |||
|
|||
|
|||
LONG_VERSION_PY = {} | |||
LONG_VERSION_PY: dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file is missing a from __future__ import __annotations
to make this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only necessary if you use bracket-stile dict[...]
or if the annotations create circular dependencies
@@ -105,7 +110,7 @@ | |||
NO_DEFAULT_PLACEHOLDER = "_no_default_" | |||
|
|||
|
|||
def _get_global_client(): | |||
def _get_global_client() -> Client | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself and others: 3.10 syntax, but backwards compatible with from __future__ import annotations
, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. 3.10 syntax works on all python versions as long as your version of mypy is recent.
Notable exception: cast
, since it is executed at runtime, needs Python 3.10 style annotations wrapped in a string.
def __init__( | ||
self, | ||
minimum: int = 0, | ||
maximum: int = math.inf, | ||
maximum: int | float = math.inf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge deal, but I wonder if we could narrow further to int | Literal[math.inf]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, no: https://www.python.org/dev/peps/pep-0586/
The following are provisionally disallowed for simplicity. We can consider allowing them in future extensions of this PEP.
- Floats: e.g. Literal[3.14]. Representing Literals of infinity or NaN in a clean way is tricky; real-world APIs are unlikely to vary their behavior based on a float parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: python/mypy#11208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shucks
import importlib | ||
import traceback | ||
from array import array | ||
from enum import Enum | ||
from functools import partial | ||
from types import ModuleType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about this type
@@ -603,7 +607,7 @@ def key_split(s): | |||
return "Other" | |||
|
|||
|
|||
def key_split_group(x): | |||
def key_split_group(x) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work to narrow x to str | bytes | tuple[Hashable]
or similar? I'd love to have written down the real scope of a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because (for some reason I didn't investigate) unexpected types are accepted and return a hardcoded "Other".
@@ -634,7 +638,7 @@ def key_split_group(x): | |||
elif typ is bytes: | |||
return key_split_group(x.decode()) | |||
else: | |||
return key_split(x) | |||
return "Other" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this avoids some redundant checks in key_split
, which would itself just return "Other" here.
if isinstance(preload, str): | ||
preload = [preload] | ||
if preload_argv and isinstance(preload_argv[0], str): | ||
preload_argv = [cast("list[str]", preload_argv)] * len(preload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this one as well to try to avoid the cast. Annoying that mypy
is unable to narrow based on the above check.
As noted above, this is going to cause issues with Cythonization in the scheduler. If we can't leave that file alone, then unfortunately I don't think we are prepared to adopt MyPy yet. |
Please move all conversation specific to scheduler.py to #5348 |
@@ -258,12 +256,3 @@ def _repr_html_(self, cluster_status=None): | |||
cluster_status=cluster_status, | |||
) | |||
return super()._repr_html_(cluster_status=cluster_status) | |||
|
|||
|
|||
clusters_to_close = weakref.WeakSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never populated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I believe we now handled here
distributed/distributed/deploy/spec.py
Lines 654 to 660 in 43d3866
@atexit.register | |
def close_clusters(): | |
for cluster in list(SpecCluster._instances): | |
if cluster.shutdown_on_close: | |
with suppress(gen.TimeoutError, TimeoutError): | |
if cluster.status != Status.closed: | |
cluster.close(timeout=10) |
pass | ||
async with Scheduler(dashboard_address=":0") as s: | ||
s.add_plugin(plugin) | ||
await plugin.start(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#5348 removes the plugins= parameter from Scheduler and Worker. It was exclusively used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use Scheduler.register_scheduler_plugin
instead? Just a suggestion, not meant to be a blocking comment
#5348 removes the plugins= parameter from Scheduler and Worker. It was exclusively used here.
There may be user code which uses plugins=
@@ -15,12 +18,10 @@ | |||
from .compression import decompress, maybe_compress | |||
from .utils import frame_split_size, msgpack_opts, pack_frames_prelude, unpack_frames | |||
|
|||
lazy_registrations = {} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
This for me is done - @ian-r-rose are there further comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky! I left several small comments / questions, but overall this looks good to me
Also, just checking but there's an empty distributed/py.typed
file being added here, is that intentional?
setup.cfg
Outdated
[options] | ||
zip_safe = False # https://mypy.readthedocs.io/en/latest/installed_packages.html | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how this is different from zip_safe=False
in setup.py
Line 110 in 43d3866
zip_safe=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I didn't notice it.
@@ -1058,15 +1062,15 @@ def reset_logger_locks(): | |||
|
|||
# TODO: Use tornado's AnyThreadEventLoopPolicy, instead of class below, | |||
# once tornado > 6.0.3 is available. | |||
if WINDOWS and hasattr(asyncio, "WindowsSelectorEventLoopPolicy"): | |||
if WINDOWS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting for future readers that asyncio.WindowsSelectorEventLoopPolicy
was added in Python 3.7 https://docs.python.org/3/whatsnew/3.7.html
@@ -75,7 +75,7 @@ def dask_teardown(worker): | |||
worker.foo = 'teardown' | |||
""" | |||
with dask.config.set( | |||
{"distributed.worker.preload": text, "distributed.nanny.preload": text} | |||
{"distributed.worker.preload": [text], "distributed.nanny.preload": [text]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It violated distributed-schema.yaml. It's purely conincidental that process_preloads() will accept a single string.
@@ -23,7 +27,13 @@ | |||
|
|||
from ..utils import ensure_bytes | |||
|
|||
compressions = {None: {"compress": identity, "decompress": identity}} | |||
if TYPE_CHECKING: | |||
from typing_extensions import Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include typing_extensions
in our conda environment file(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we are running everything through pre-commit. typing_extensions shouldn't be in the env files for the same reason flake8, black, isort and mypy aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: if you look at #5348, typing_extensions is installed there exclusively in the mypy virtualenv by .pre-conmmit-config.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks for clarifying
@@ -258,12 +256,3 @@ def _repr_html_(self, cluster_status=None): | |||
cluster_status=cluster_status, | |||
) | |||
return super()._repr_html_(cluster_status=cluster_status) | |||
|
|||
|
|||
clusters_to_close = weakref.WeakSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I believe we now handled here
distributed/distributed/deploy/spec.py
Lines 654 to 660 in 43d3866
@atexit.register | |
def close_clusters(): | |
for cluster in list(SpecCluster._instances): | |
if cluster.shutdown_on_close: | |
with suppress(gen.TimeoutError, TimeoutError): | |
if cluster.status != Status.closed: | |
cluster.close(timeout=10) |
if self.cluster: | ||
return self.cluster.loop | ||
else: | ||
return IOLoop.current() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this another "mypy
caught a bug" case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It caught an antipattern where the parent class expected properties that only exist in the children classes.
logging_names: dict[str | int, int | str] = {} | ||
logging_names.update(logging._levelToName) # type: ignore | ||
logging_names.update(logging._nameToLevel) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're loosing a copy()
here, just wanted to double check that's okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're replacing a copy() with an update() for the sake of readability. There's no functional difference.
if TYPE_CHECKING: | ||
try: | ||
import ucp | ||
from ucp import create_endpoint as ucx_create_endpoint | ||
from ucp import create_listener as ucx_create_listener | ||
except ImportError: | ||
pass | ||
else: | ||
ucp = None # type: ignore | ||
ucx_create_endpoint = None # type: ignore | ||
ucx_create_listener = None # type: ignore | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pentschev @madsbk @quasiben could one of you review the changes to this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping here @jrbourbeau . We can't import ucp
here, we have to do it from init_once
, otherwise UCX will be imported before we can set any of its configurations. However, it seems it's only being imported here when TYPE_CHECKING
is defined, when is that going to happen in Distributed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it seems it's only being imported here when TYPE_CHECKING is defined, when is that going to happen in Distributed?
Today we don't run any type checking as part of the development process, but after #5348 is merged we'll start running mypy
as a pre-commit
hook. Given that this ucp
import will only happen during type checking, do you think the proposed changes here are okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's guaranteed that TYPE_CHECKING is False
for all runtime situations, then I think this is no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be the case. From https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING:
It is False at runtime
cc @crusaderky for confirmation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we should be good for now. If there's any reason why TYPE_CHECKING is True
at runtime, we'll need to make changes to cover that.
_shutdown = _close | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting to future readers that there's a _shutdown
method defined a few lines below
distributed/preloading.py
Outdated
if TYPE_CHECKING: | ||
from .core import Server | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's usually the reason for this pattern, yes. However, as far as I can tell it's not actually needed here, since Server
doesn't depend on preloading
, and everything that imports preloading
has already imported Server
somewhere.
distributed/active_memory_manager.py
Outdated
@@ -215,7 +217,7 @@ def _find_recipient( | |||
candidates -= pending_repl | |||
if not candidates: | |||
return None | |||
return min(candidates, key=self.workers_memory.get) | |||
return min(candidates, key=self.workers_memory.get) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cringe a bit to suggest this, but self.workers_memory.__getitem__
passes muster here, since it will raise if the key doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
|
||
if not self.new_spec: | ||
raise ValueError("To scale by cores= you must specify cores per worker") | ||
assert False, "unreachable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be triggered if none of the above keys are in new_spec
. Without this assertion, the proper return type of this function would be Optional[int]
. Perhaps a more helpful error would be something similar to the ValueError
above?
@status.setter | ||
def status(self, value): | ||
raise NotImplementedError() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was curious about this as well -- it's because this inherits from Worker
, where status
isn't read-only. Without this it complains that we are swapping a read-write property with a read-only one, which, fair enough.
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
All review comments have been incorporated |
Yes, this file is used by mypy when it validates third party code that imports distributed, and it signals that mypy should fetch type annotations from the library itself (as opposed to look for a separate stubs package). |
@jakirkham earlier you expressed some hesitation around adopting |
Yeah I'm ok with that James. Thanks for checking :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
Thanks @crusaderky for your work on this and addressing all the review comments from @ian-r-rose and me
Woo! |
Partial work towards #2803.
This PR covers everything but scheduler.py and actually running mypy in CI.
Continues in #5348.