Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime env] Make plugin setup process that has not been refactor run in threads. #22588

Merged
merged 6 commits into from
Feb 28, 2022

Conversation

Catch-Bull
Copy link
Contributor

@Catch-Bull Catch-Bull commented Feb 23, 2022

I recently realized that during a runtime_env creation process, a plugin/manager that is very slow to setup may block the creation of other runtime_env, so I make plugin/manager setup run in threads.

The refactor of PipManager is about to be completed, so I ignore it in this PR.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(



def test_plugin_hang(ray_start_regular):
@ray.remote(num_cpus=0.1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before this PR, this case will be failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding this!

Copy link
Contributor

@SongGuyang SongGuyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this fix is very important for stability.

@edoakes
Copy link
Contributor

edoakes commented Feb 23, 2022

Thanks @Catch-Bull. Could you please also leave a comment for why we're doing this and a TODO to remove it once refactored?

@fyrestone
Copy link
Contributor

We only make the create method to async in the PR #22311, the logic in the create method are sync.

The problem is that running sync logic in an async method blocks the asyncio loop. This PR makes the setup process (sync logic) in threads before they are fully async.

# Currently create method is still a sync process, to avoid blocking
# the loop, need to run this function in another thread.
# TODO(Catch-Bull): Refactor method create into an async process, and
# make this method running in current loop.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Catch-Bull. Could you please also leave a comment for why we're doing this and a TODO to remove it once refactored?

@edoakes I add annotations and TODO, and copy those to conda.py, py_modules.py, and working_dir.py

@edoakes
Copy link
Contributor

edoakes commented Feb 24, 2022

test_runtime_env_plugin failing.

@Catch-Bull
Copy link
Contributor Author

test_runtime_env_plugin failing.

hmmm, It seems flaky:

image

@architkulkarni
Copy link
Contributor

test_runtime_env_plugin failing.

hmmm, It seems flaky:

According to https://flaky-tests.ray.io/ test_runtime_env_plugin isn't flaky on master, which means the flakiness is probably introduced by this PR somehow--any ideas what's causing it?

@Catch-Bull
Copy link
Contributor Author

test_runtime_env_plugin failing.

hmmm, It seems flaky:

According to https://flaky-tests.ray.io/ test_runtime_env_plugin isn't flaky on master, which means the flakiness is probably introduced by this PR somehow--any ideas what's causing it?

It will raise Import error, I have no idea why this happens on windows
image

but, I found that after I did not call RuntimeEnvPlugin in remote_function, the test basically passed every time, change in here: 37698a6

@raulchen raulchen merged commit aa1885a into ray-project:master Feb 28, 2022
@raulchen raulchen deleted the await_fixed branch February 28, 2022 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants