Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mypy in CI 1/2 #5328

Merged
merged 24 commits into from
Sep 30, 2021
Merged

Enable mypy in CI 1/2 #5328

merged 24 commits into from
Sep 30, 2021

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Sep 17, 2021

Partial work towards #2803.
This PR covers everything but scheduler.py and actually running mypy in CI.
Continues in #5348.

@crusaderky crusaderky marked this pull request as draft September 17, 2021 13:03
@crusaderky
Copy link
Collaborator Author

The good: all files except scheduler.py are now passing mypy checks.
The bad: putting # type: ignore on top of scheduler.py breaks everything else. So I'll have to go through it and it's probably going to end up in a heap of # type: ignore on individual lines.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky!

As this is a relatively large change (though one I think there's interest in), we should be sure to get feedback from folks who regularly contribute to distributed. cc @fjetter, @jakirkham, @gjoseph92, @mrocklin, @jacobtomlinson, @madsbk thoughts on adding type annotations? (also feel free to rope others into this discussion)

@madsbk
Copy link
Contributor

madsbk commented Sep 24, 2021

I think this is a great idea, thanks @crusaderky!

@crusaderky crusaderky changed the title Enable mypy in CI Enable mypy in CI 1/2 Sep 24, 2021
@crusaderky
Copy link
Collaborator Author

crusaderky commented Sep 24, 2021

As discussed in the latest coiled meeting, I have reverted all changes to scheduler.py and the pre-commit hook, to publish them in a second iterative PR.
This PR is now ready for review.

@crusaderky crusaderky marked this pull request as ready for review September 24, 2021 15:57
@jakirkham
Copy link
Member

jakirkham commented Sep 24, 2021

Generally ok with this. However I do have one concern.

In particular I'd like us to be aware that scheduler.py has very particular annotations based on what Cython will accept and work with efficiently (it can take some other things, but run poorly instead, which make finding/fixing harder). My guess is Cython annotations do not map very well to how MyPy understands things. So would recommend we leave scheduler.py alone. If we opt to Cythonize more things in the future, we may need to deviate from MyPy there as well.

Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky, this is awesome

@@ -135,7 +129,7 @@ def remote_magic(line, cell=None):


# cache clients for re-use in remote magic
remote_magic._clients = {}
remote_magic._clients = {} # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remote_magic._clients = {} # type: ignore
remote_magic._clients: Dict[str, BlockingKernelClient] = {}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed. We should always use lowercase dict[...], list[...], etc. and put from __future__ import annotations at the top of the module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: can't.

distributed/_ipython_utils.py:133: error: Type cannot be declared in assignment to non-self attribute
distributed/_ipython_utils.py:133: error: "Callable[[Any, Any], Any]" has no attribute "_clients"

@@ -50,7 +50,7 @@ class NotThisMethod(Exception):
"""Exception raised if a method is not valid for the current scenario."""


LONG_VERSION_PY = {}
LONG_VERSION_PY: dict = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is missing a from __future__ import __annotations to make this work.

Copy link
Collaborator Author

@crusaderky crusaderky Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's only necessary if you use bracket-stile dict[...] or if the annotations create circular dependencies

@@ -105,7 +110,7 @@
NO_DEFAULT_PLACEHOLDER = "_no_default_"


def _get_global_client():
def _get_global_client() -> Client | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself and others: 3.10 syntax, but backwards compatible with from __future__ import annotations, AFAIK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. 3.10 syntax works on all python versions as long as your version of mypy is recent.
Notable exception: cast, since it is executed at runtime, needs Python 3.10 style annotations wrapped in a string.

distributed/dashboard/components/scheduler.py Show resolved Hide resolved
distributed/dashboard/utils.py Show resolved Hide resolved
def __init__(
self,
minimum: int = 0,
maximum: int = math.inf,
maximum: int | float = math.inf,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal, but I wonder if we could narrow further to int | Literal[math.inf]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no: https://www.python.org/dev/peps/pep-0586/

The following are provisionally disallowed for simplicity. We can consider allowing them in future extensions of this PEP.

  • Floats: e.g. Literal[3.14]. Representing Literals of infinity or NaN in a clean way is tricky; real-world APIs are unlikely to vary their behavior based on a float parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shucks

import importlib
import traceback
from array import array
from enum import Enum
from functools import partial
from types import ModuleType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about this type

@@ -603,7 +607,7 @@ def key_split(s):
return "Other"


def key_split_group(x):
def key_split_group(x) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work to narrow x to str | bytes | tuple[Hashable] or similar? I'd love to have written down the real scope of a key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because (for some reason I didn't investigate) unexpected types are accepted and return a hardcoded "Other".

@@ -634,7 +638,7 @@ def key_split_group(x):
elif typ is bytes:
return key_split_group(x.decode())
else:
return key_split(x)
return "Other"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this avoids some redundant checks in key_split, which would itself just return "Other" here.

if isinstance(preload, str):
preload = [preload]
if preload_argv and isinstance(preload_argv[0], str):
preload_argv = [cast("list[str]", preload_argv)] * len(preload)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this one as well to try to avoid the cast. Annoying that mypy is unable to narrow based on the above check.

@jakirkham
Copy link
Member

@jakirkham

So would recommend we leave scheduler.py alone.

Would like to, but can't. Adding # type: ignore at the top of the file breaks everything else. I could not make the --exclude parameter of mypy work either. So the only alternative to changing the actual annotations in scheduler.py is to add a wealth of # type: ignore on its individual lines.

As noted above, this is going to cause issues with Cythonization in the scheduler. If we can't leave that file alone, then unfortunately I don't think we are prepared to adopt MyPy yet.

cc @mrocklin @quasiben (for awareness)

@crusaderky
Copy link
Collaborator Author

Please move all conversation specific to scheduler.py to #5348

@@ -258,12 +256,3 @@ def _repr_html_(self, cluster_status=None):
cluster_status=cluster_status,
)
return super()._repr_html_(cluster_status=cluster_status)


clusters_to_close = weakref.WeakSet()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never populated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I believe we now handled here

@atexit.register
def close_clusters():
for cluster in list(SpecCluster._instances):
if cluster.shutdown_on_close:
with suppress(gen.TimeoutError, TimeoutError):
if cluster.status != Status.closed:
cluster.close(timeout=10)

pass
async with Scheduler(dashboard_address=":0") as s:
s.add_plugin(plugin)
await plugin.start(s)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5348 removes the plugins= parameter from Scheduler and Worker. It was exclusively used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Scheduler.register_scheduler_plugin instead? Just a suggestion, not meant to be a blocking comment

#5348 removes the plugins= parameter from Scheduler and Worker. It was exclusively used here.

There may be user code which uses plugins=

@@ -15,12 +18,10 @@
from .compression import decompress, maybe_compress
from .utils import frame_split_size, msgpack_opts, pack_frames_prelude, unpack_frames

lazy_registrations = {}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

@crusaderky
Copy link
Collaborator Author

This for me is done - @ian-r-rose are there further comments?
Any objections to merging?

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky! I left several small comments / questions, but overall this looks good to me

Also, just checking but there's an empty distributed/py.typed file being added here, is that intentional?

setup.cfg Outdated
Comment on lines 33 to 35
[options]
zip_safe = False # https://mypy.readthedocs.io/en/latest/installed_packages.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how this is different from zip_safe=False in setup.py

zip_safe=False,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I didn't notice it.

@@ -1058,15 +1062,15 @@ def reset_logger_locks():

# TODO: Use tornado's AnyThreadEventLoopPolicy, instead of class below,
# once tornado > 6.0.3 is available.
if WINDOWS and hasattr(asyncio, "WindowsSelectorEventLoopPolicy"):
if WINDOWS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting for future readers that asyncio.WindowsSelectorEventLoopPolicy was added in Python 3.7 https://docs.python.org/3/whatsnew/3.7.html

@@ -75,7 +75,7 @@ def dask_teardown(worker):
worker.foo = 'teardown'
"""
with dask.config.set(
{"distributed.worker.preload": text, "distributed.nanny.preload": text}
{"distributed.worker.preload": [text], "distributed.nanny.preload": [text]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It violated distributed-schema.yaml. It's purely conincidental that process_preloads() will accept a single string.

@@ -23,7 +27,13 @@

from ..utils import ensure_bytes

compressions = {None: {"compress": identity, "decompress": identity}}
if TYPE_CHECKING:
from typing_extensions import Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include typing_extensions in our conda environment file(s)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we are running everything through pre-commit. typing_extensions shouldn't be in the env files for the same reason flake8, black, isort and mypy aren't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: if you look at #5348, typing_extensions is installed there exclusively in the mypy virtualenv by .pre-conmmit-config.yaml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for clarifying

distributed/preloading.py Outdated Show resolved Hide resolved
@@ -258,12 +256,3 @@ def _repr_html_(self, cluster_status=None):
cluster_status=cluster_status,
)
return super()._repr_html_(cluster_status=cluster_status)


clusters_to_close = weakref.WeakSet()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I believe we now handled here

@atexit.register
def close_clusters():
for cluster in list(SpecCluster._instances):
if cluster.shutdown_on_close:
with suppress(gen.TimeoutError, TimeoutError):
if cluster.status != Status.closed:
cluster.close(timeout=10)

Comment on lines +211 to +214
if self.cluster:
return self.cluster.loop
else:
return IOLoop.current()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this another "mypy caught a bug" case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It caught an antipattern where the parent class expected properties that only exist in the children classes.

Comment on lines +9 to +11
logging_names: dict[str | int, int | str] = {}
logging_names.update(logging._levelToName) # type: ignore
logging_names.update(logging._nameToLevel) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're loosing a copy() here, just wanted to double check that's okay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're replacing a copy() with an update() for the sake of readability. There's no functional difference.

Comment on lines +31 to +42
if TYPE_CHECKING:
try:
import ucp
from ucp import create_endpoint as ucx_create_endpoint
from ucp import create_listener as ucx_create_listener
except ImportError:
pass
else:
ucp = None # type: ignore
ucx_create_endpoint = None # type: ignore
ucx_create_listener = None # type: ignore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev @madsbk @quasiben could one of you review the changes to this module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping here @jrbourbeau . We can't import ucp here, we have to do it from init_once, otherwise UCX will be imported before we can set any of its configurations. However, it seems it's only being imported here when TYPE_CHECKING is defined, when is that going to happen in Distributed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it seems it's only being imported here when TYPE_CHECKING is defined, when is that going to happen in Distributed?

Today we don't run any type checking as part of the development process, but after #5348 is merged we'll start running mypy as a pre-commit hook. Given that this ucp import will only happen during type checking, do you think the proposed changes here are okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's guaranteed that TYPE_CHECKING is False for all runtime situations, then I think this is no problem.

Copy link
Member

@jrbourbeau jrbourbeau Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be the case. From https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING:

It is False at runtime

cc @crusaderky for confirmation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should be good for now. If there's any reason why TYPE_CHECKING is True at runtime, we'll need to make changes to cover that.

Comment on lines -1380 to -1381
_shutdown = _close

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting to future readers that there's a _shutdown method defined a few lines below

Comment on lines 21 to 23
if TYPE_CHECKING:
from .core import Server

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's usually the reason for this pattern, yes. However, as far as I can tell it's not actually needed here, since Server doesn't depend on preloading, and everything that imports preloading has already imported Server somewhere.

@@ -215,7 +217,7 @@ def _find_recipient(
candidates -= pending_repl
if not candidates:
return None
return min(candidates, key=self.workers_memory.get)
return min(candidates, key=self.workers_memory.get) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cringe a bit to suggest this, but self.workers_memory.__getitem__ passes muster here, since it will raise if the key doesn't exist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch


if not self.new_spec:
raise ValueError("To scale by cores= you must specify cores per worker")
assert False, "unreachable"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be triggered if none of the above keys are in new_spec. Without this assertion, the proper return type of this function would be Optional[int]. Perhaps a more helpful error would be something similar to the ValueError above?

Comment on lines +399 to +402
@status.setter
def status(self, value):
raise NotImplementedError()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was curious about this as well -- it's because this inherits from Worker, where status isn't read-only. Without this it complains that we are swapping a read-write property with a read-only one, which, fair enough.

@crusaderky
Copy link
Collaborator Author

All review comments have been incorporated

@crusaderky
Copy link
Collaborator Author

Also, just checking but there's an empty distributed/py.typed file being added here, is that intentional?

Yes, this file is used by mypy when it validates third party code that imports distributed, and it signals that mypy should fetch type annotations from the library itself (as opposed to look for a separate stubs package).

@jrbourbeau
Copy link
Member

@jakirkham earlier you expressed some hesitation around adopting mypy due to its potential impact on the Cythonized scheduler (#5328 (comment)). @crusaderky has isolated the changes to scheduler.py to #5348 where he is, I think, taking a conservative approach with modifications. The current plan is to merge this PR (which doesn't touch scheduler.py) and then have someone review the scheduler.py changes in #5348 for performance regressions (you're clearly the most qualified person to do this). Just wanted to check to see if you're okay with this plan before moving forward.

@jakirkham
Copy link
Member

Yeah I'm ok with that James. Thanks for checking :)

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Thanks @crusaderky for your work on this and addressing all the review comments from @ian-r-rose and me

@jrbourbeau jrbourbeau merged commit 7a3ea4c into dask:main Sep 30, 2021
@ian-r-rose
Copy link
Collaborator

Woo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants