-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime env] Async pip runtime env #22381
[runtime env] Async pip runtime env #22381
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great for me! Only left some little comments.
str_list = [super().__str__().strip(",.")] | ||
out = { | ||
"stdout": self.stdout, | ||
"stderr": self.stderr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stderr" will be empty because we redirect it to stdout https://github.com/ray-project/ray/pull/22381/files#diff-3968b5fc7121d304a253988b1bed9fdd58c910dbae0eb33bc0405e2a9a1bfec3R346 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to this part, maybe you can modify the implementation in conda_utils.py to avoid splitting the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part
Yes. But we can't assume that the stderr is always redirected to the stdout. The implementation of SubprocessCalledProcessError
should cover the interface of the parent subprocess.CalledProcessError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to this part, maybe you can modify the implementation in conda_utils.py to avoid splitting the implementation
The exec_cmd
and exec_cmd_stream_to_logger
in the conda_utils.py
are used by many runtime envs, not only by the conda. I thinks it's better to put the exec utils into the utils.py
instead of conda_utils.py
.
But, we are going to make all the runtime env async, so exec_cmd
and exec_cmd_stream_to_logger
will be removed after they are not referenced.
@architkulkarni @edoakes @rkooo567 Hi, can you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change btw :)! I will leave the further review to @architkulkarni who should have more context in this code path
Deferring to @architkulkarni to review. Let's make sure there's no regression in logging/exception messages due to the change (hopefully our existing tests are enough here) |
from contextlib import asynccontextmanager | ||
except ImportError: | ||
# Copy from https://github.com/python-trio/async_generator | ||
# for compatible with Python 3.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkooo567 The Python 3.6 does not support asynccontextmanager
. This code is copied from async_generator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Just have one question:
Are we sure that it's safe to remove the lock for concurrent pip installs? I think concurrently installing pip environments can cause some unexpected errors, e.g. #17086. Or is it safe now because we added --no-cache-dir
?
If we can add a test to cover the case of concurrent installs, I'd feel better about this. Something like installing two runtime envs, pip: ["A", "B"]
concurrently with pip: ["A", "C"]
. Is it possible to add such a test to this PR?
Concurrent installs in different virtualenvs is OK because,
I will add a test to cover the current installs. |
I noticed that the |
@architkulkarni Hi, all the comments have been fixed. Can this PR be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! There were a handful of failed tests in CI but I don't think they're related to this PR; I restarted them just to have more confidence.
Thanks. |
In order to initialize runtime env concurrently, this PR makes pip runtime env asynchronous. It includes, - [x] New `check_output_cmd` in runtime env utils. - [x] Async PipProcessor. - [x] The `asynccontextmanager` from `https://github.com/python-trio/async_generator` for Python 3.6 - [x] Remove pip runtime env lock. - [x] Disable pip cache. Co-authored-by: 刘宝 <po.lb@antfin.com>
…n in threads. (#22588) I recently realized that during a runtime_env creation process, a plugin/manager that is very slow to setup may block the creation of other runtime_env, so I make plugin/manager setup run in threads. [The refactor of `PipManager`](#22381) is about to be completed, so I ignore it in this PR.
Why are these changes needed?
In order to initialize runtime env concurrently, this PR makes pip runtime env asynchronous. It includes,
check_output_cmd
in runtime env utils.asynccontextmanager
fromhttps://github.com/python-trio/async_generator
for Python 3.6Related issue number
#21950
Checks
scripts/format.sh
to lint the changes in this PR.