-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime env] runtime env inheritance refactor #22244
Conversation
@architkulkarni @edoakes @rkooo567 Current PR is ready to review. |
Please update the description! |
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for this PR! Just had a few minor questions.
{"pip": ["torch", "ray[serve]"], | ||
"env_vars": {"B": "new", "C", "c"}} | ||
# Child updates `runtime_env` | ||
Actor.options(runtime_env=ray.get_current_runtime_env().update({"env_vars": {"A": "a", "B": "b"}})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this example! I think we should add a full entry for get_current_runtime_env
in an API reference somewhere. It could be on this page, or in the API reference for https://docs.ray.io/en/latest/package-ref.html#runtime-context-apis, not sure which is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -745,7 +745,7 @@ def decorate_actor(): | |||
def test_init_requires_no_resources(call_ray_start, use_client): | |||
import ray | |||
|
|||
if use_client: | |||
if not use_client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :)
@@ -277,6 +277,14 @@ def get_runtime_context(self): | |||
""" | |||
return ClientWorkerPropertyAPI(self.worker).build_runtime_context() | |||
|
|||
def get_current_runtime_env(self): | |||
"""Get the runtime env of the current client/driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from the get_current_runtime_env
in runtime_env.py
? Does the other one return the runtime env of the current task/actor/job, while this one only returns the runtime env installed by the Ray client server (the job-level runtime env)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right!
src/ray/core_worker/core_worker.cc
Outdated
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | ||
<< parent_serialized_runtime_env << ". It will be overrode by " | ||
<< serialized_runtime_env << "."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | |
<< parent_serialized_runtime_env << ". It will be overrode by " | |
<< serialized_runtime_env << "."; | |
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | |
<< parent_serialized_runtime_env << ". It will be overridden by " | |
<< serialized_runtime_env << "."; |
Why is this a WARNING? Isn't it expected normal behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this WARNING log because current PR already changed the API behavior. It is from the advise of Edward #21818 (comment). Maybe we can change this log level After a period of time? I can add a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah makes sense! Yeah, we'll need some way of remembering to change the log level in the future. A TODO sounds like a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when you merge this PR could you make a quick issue for this and add it to the runtime_env backlog milestone? I think that will help prevent it from being lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is pretty close to merge. Just a few comments about documentation .
@@ -349,27 +349,21 @@ Inheritance | |||
|
|||
The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sentence still applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actor/tasks with unspecified runtime envs will still inherit the env of their parent. Maybe we can clarify by changing "unless it is overriden." -> "unless it is overridden by explicitly specifying a runtime environment for the child task/actor."
@@ -349,27 +349,21 @@ Inheritance | |||
|
|||
The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden. | |||
|
|||
If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task) as follows: | |||
If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the doc is confusing. Can we rewrite in this format?
By default, all actors and tasks inherit the parent's runtime_env.
-- here, show an example code --
However, if you specify runtime_env for task/actor, it will override the parents' runtime env
-- show an example --
If you'd like to still use parent's runtime environment,
-- show an example --
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a lot easier to understand!
Btw, I don't like much about parent's runtime env
. Is it possible to just show with ray.init as an example? Or do we explain the meaning of "parent" before this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ray.init
and rename to "current runtime"
python/ray/runtime_env.py
Outdated
if _runtime_env is None: | ||
_runtime_env = dict(ray.get_runtime_context().runtime_env) | ||
|
||
return _runtime_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this global thing? Can't we just return dict(ray.get_runtime_context().runtime_env)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a little optimization as runtime context. Current runtime env is immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the optimization? Can we just not do this? I want to avoid doing unnecessary optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
["ray start --head --ray-client-server-port 25553"], | ||
indirect=True, | ||
) | ||
@pytest.mark.parametrize("use_client", [False, True]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests to verify the inheritance now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_e2e_complex
can cover this.
src/ray/core_worker/core_worker.cc
Outdated
std::string CoreWorker::OverrideTaskOrActorRuntimeEnv( | ||
const std::string &serialized_runtime_env, | ||
std::vector<std::string> *runtime_env_uris) { | ||
std::shared_ptr<rpc::RuntimeEnv> parent_runtime_env; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it a shared pointer? Can we just make it rpc::RuntimeEnv
? Also, this API seems dangerous (worker_context_.GetCurrentRuntimeEnv()). The return value should be immutable, but now we can mutate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid object copy.
I can modify the return type to std::shared_ptr<const rpc::RuntimeEnv>
, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified return type sounds good to me!
// TODO(SongGuyang): We add this warning log because of the change of API behavior. | ||
// Refer to https://github.com/ray-project/ray/issues/21818. | ||
// Modify this log level to `INFO` or `DEBUG` after a few release versions. | ||
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is not user-facing, so I am not sure if this will do anything... Maybe just removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by not user-facing? Do you mean it won't be streamed to the driver and will only show up in the log files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This log will just spam core log files and won’t be delivered to users in a clear way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! Is there a recommended way to stream a warning to the driver from C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use log level error to do that. But i feel like it is not a good idea cuz it can become extremely spammy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the WARNING
log can't be streamed to driver in my test. Maybe we can keep this log for debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with merging the PR, but I'd like to avoid having global stuff.
Lmk when it is ready to merge! |
@rkooo567 Ready to merge! |
Hey @SongGuyang I just noticed this while reviewing the other PR, but why do we need to add a new |
@edoakes We will support a strong-typed API for runtime env and we will add new APIs to "ray.runtime_env" anyway. So, I want to unify all the APIs about runtime env to one code path. In addition, we also need to add a new API like |
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
…2244)" (ray-project#22626) Breaks train_torch_linear_test.py.
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
* [runtime env] runtime env inheritance refactor (#22244) Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
Why are these changes needed?
Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):
We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.
Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
Add a new API named
ray.runtime_env.get_current_runtime_env()
to get the parent runtime env and modify this dict by yourself. Like:Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))
This new API also can be used in ray client.
Related issue number
#21818
Checks
scripts/format.sh
to lint the changes in this PR.