-
-
Notifications
You must be signed in to change notification settings - Fork 750
Remove protocol="ucx" support in favor of distributed-ucxx #9105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 11h 26m 34s ⏱️ + 22m 59s For more details on these failures and errors, see this check. Results for commit 52a17b3. ± Comparison against base commit 0f0adef. This pull request removes 7 and adds 1 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
jacobtomlinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pentschev. Looking forward to this landing.
|
Test failures appear unrelated. Merging |
|
Thanks @jacobtomlinson ! 😄 |
Rewrite the UCXX/UCX-Py proxy backends added in #116 to deprecate UCX-Py. We have already removed the UCX-Py backend from Distributed in dask/distributed#9105, but while there's no release we cannot rely on that, so the changes in the PR work in the same way we expect when the migration is completed and Distributed has a new release. If distributed-ucxx is not installed a warning and exception are raised telling the user to install it. The state after this PR is as follows: 1. If distributed-ucxx is installed: 1. `protocol="ucx"` / `protocol="ucxx"` should continue working as before, both were already using distributed-ucxx 2. `protocol="ucx-old"` (only exists in rapids-dask-dependency added in #116) will raise the same `FutureWarning` as `protocol="ucx"` without RAPIDS 2. If distributed-ucxx is NOT installed: 1. `protocol="ucx"` / `protocol="ucxx"` / `protocol="ucx-old"` all raise a `FutureWarning` and don’t let users create processes. The final state is expected to be achieved when a new Distributed release is cut and the `__rdd_patch_ucx.py` file is removed from rapids-dask-dependency, which will be: 1. If distributed-ucxx is installed: 1. `protocol="ucx"` (now transparently uses distributed-ucxx) / `protocol="ucxx"` (always used distributed-ucxx) should continue working as before 2. If distributed-ucxx is NOT installed: 1. `protocol="ucx"` raises a `FutureWarning` and doesn’t let users create processes 2. `protocol="ucxx"` never existed before distributed-ucxx so it will just fail with a wrong protocol name Specifically I've tested the following to confirm what's behavior matches what is described above as the state of this PR (using all `ucx`/`ucxx`/`ucx-old` combinations): 1. `dask scheduler --protocol=ucx` 2. `dask worker ucx://127.0.0.1:8786` 3. `LocalCluster(protocol="ucx", silence_logs=False)` 4. `LocalCUDACluster(protocol="ucx", silence_logs=False)` Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: #120
) Now that Distributed deprecated `ucx://` in dask/distributed#9105, we should now ensure `distributed-ucxx` registers both backends. This is currently handled by rapids-dask-dependency, but it will ultimately be removed from there when everything is in place. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #504
Some changes were needed to make rapids-dask-dependency patches enable distributed-ucxx as the handler for Distributed comms protocols. Once that is completely removed from rapids-dask-dependency and a new Distributed release containing dask/distributed#9105 is out, changes from this PR can be merged. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #505
The UCX-Py project is deprecated and going to be archived very soon. To continue using
protocol="ucx"users will need to install the newdistributed-ucxxpackage, that contains the communication plugin for UCXX. This PR removes the existing communicator implementation based on UCX-Py, registering a new backend using thedistributed_ucxxmodule (when installed), otherwise raise a deprecation warning and exception back to the user whenprotocol="ucx"is used.pre-commit run --all-files