Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mypy in CI 2/2 #5348

Merged
merged 8 commits into from
Oct 13, 2021
Merged

Enable mypy in CI 2/2 #5348

merged 8 commits into from
Oct 13, 2021

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Sep 24, 2021

Closes #2803.
Iterative follow-up to #5328.

scheduler.py should be reviewed thoroughly for performance regressions in the cythonised code.

Comment on lines +545 to +547
services: "dict | None" = None,
versions: "dict | None" = None,
extra: "dict | None" = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the Cython code around the usage of these variables look like? In the past I've seen Cython handle these "" types as just objects. If that's still true, that would mean Cython does not use the faster C level APIs associated with these objects, which would lose some of the performance gains we got with Cythonizing the scheduler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, they degrade performance. Reverted.

@crusaderky
Copy link
Collaborator Author

Moving the conversation from #5328

@crusaderky
Copy link
Collaborator Author

@jakirkham

So would recommend we leave scheduler.py alone.

Would like to, but can't. Adding # type: ignore at the top of the file breaks everything else. I could not make the --exclude parameter of mypy work either. So the only alternative to changing the actual annotations in scheduler.py is to add a wealth of # type: ignore on its individual lines.

As noted above, this is going to cause issues with Cythonization in the scheduler. If we can't leave that file alone, then unfortunately I don't think we are prepared to adopt MyPy yet.

Could you articulate how # type: ignore harms cythonization? For example, what harm does changing this

@cclass
class TaskGroup:
    _last_worker: WorkerState

    def __init__(self, name: str):
        self._last_worker = None

to this?

@cclass
class TaskGroup:
    _last_worker: WorkerState

    def __init__(self, name: str):
        self._last_worker = None  # type: ignore

note that I'm not advocating for changing _last_worker: WorkerState to _last_worker: "WorkerState | None", which as you pointed out does introduce degradation.

@crusaderky
Copy link
Collaborator Author

Notes from today's maintainers meeting:

At the moment, there's the problem that developers can easily introduce performance degradation in the Cythonized code every time they touch scheduler.py. Comparing side by side the HTML reports before/after the change is feasible only when one has a specific doubt, not as a general approach going forward.
A few solutions were proposed:

  • Have every PR that substantially alters scheduler.py be reviewed by @jakirkham (this is obviously rhetorical and not sustainable)
  • Ignore Cython performance to reduce the attrition on pure-Python development. A notable exception to this would be PRs like this one that primarily focus on type annotations, that should be as conservative as possible. Leave @jakirkham to patch it up in his own time at a later date.
  • Thoroughly cover individual scheduler methods with ASV. Whole-cluster performance measures can be trickier to write because the benefits of Cython will only become visible when the scheduler is the bottleneck, which is challenging to reproduce particularly on single-host CI environments.
  • Investigate and appropriate the performance reports that were produced for the Rust scheduler rewrite. Again running those measures on CI environments and getting meaningful insights out of them is likely going to be challenging.

@crusaderky
Copy link
Collaborator Author

Work in scheduler.py is done. All unit test failures are unrelated. Waiting for merge of #5328.

_transition_counter: Py_ssize_t
_plugins: dict # dict[str, SchedulerPlugin]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled up from Scheduler since SchedulerState was invoking self.plugins on many occasions

unrunnable: set,
validate: bint,
plugins: "Iterable[SchedulerPlugin]" = (),
**kwargs, # Passed verbatim to Server.__init__()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a wealth of needless default values for init params

@crusaderky
Copy link
Collaborator Author

All test failures are unrelated.
To my understanding none of the changes should cause a slowdown to critical sections of the code.
Ready for final review.

Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an upstream cython issue for why using stringy type annotations makes performance worse? Is it just because they haven't implemented it, or is there a deeper reason?

@@ -5461,7 +5480,7 @@ def add_plugin(self, plugin=None, idempotent=False, name=None, **kwargs):

"""
if isinstance(plugin, type):
plugin = plugin(self, **kwargs)
plugin = plugin(self, **kwargs) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be fixed by capturing scheduler plugin kwargs in a ParamSpec and making a TypeVar for the add_plugin signature. I don't think we should actually do that here, though, since ParamSpec is in typing_extensions~=3.10, which is not universally supported yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy is right complaining here. This is genuinely wrong; most plugins DO NOT accept a scheduler positional arg.
See #5394 for proper fix

@crusaderky
Copy link
Collaborator Author

@jakirkham are you happy to merge this?

@crusaderky
Copy link
Collaborator Author

Merging on Wednesday if there are no further comments

@@ -2517,6 +2517,8 @@ async def gather_dep(

if not to_gather_keys:
return
assert cause
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverts #5160


from the root of the distributed repository. Now the code linters will be run each time
you commit changes. You can skip these checks with ``git commit --no-verify`` or with
the short version ``git commit -n``.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align to dask/dask

@crusaderky crusaderky merged commit efbf888 into dask:main Oct 13, 2021
@crusaderky crusaderky deleted the mypy2 branch October 13, 2021 09:16
@crusaderky
Copy link
Collaborator Author

In it goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mild type annotations
4 participants