Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime env] runtime env inheritance refactor #22244

Merged
merged 12 commits into from
Feb 21, 2022

Conversation

SongGuyang
Copy link
Contributor

@SongGuyang SongGuyang commented Feb 9, 2022

Why are these changes needed?

Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):

  • The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
  • Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:

  • If there is no runtime env option when we create actor, inherit the parent runtime env.
  • Otherwise, use the optional runtime env directly and don't do the merging.

Add a new API named ray.runtime_env.get_current_runtime_env() to get the parent runtime env and modify this dict by yourself. Like:
Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))
This new API also can be used in ray client.

Related issue number

#21818

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@SongGuyang SongGuyang changed the title [WIP][runtime env] runtime env inheritance refactor [runtime env] runtime env inheritance refactor Feb 9, 2022
@rkooo567 rkooo567 self-assigned this Feb 9, 2022
@SongGuyang SongGuyang changed the title [runtime env] runtime env inheritance refactor [WIP][runtime env] runtime env inheritance refactor Feb 9, 2022
@SongGuyang SongGuyang changed the title [WIP][runtime env] runtime env inheritance refactor [runtime env] runtime env inheritance refactor Feb 15, 2022
@SongGuyang
Copy link
Contributor Author

@architkulkarni @edoakes @rkooo567 Current PR is ready to review.

@rkooo567
Copy link
Contributor

Please update the description!

@SongGuyang
Copy link
Contributor Author

Please update the description!

Done!

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for this PR! Just had a few minor questions.

{"pip": ["torch", "ray[serve]"],
"env_vars": {"B": "new", "C", "c"}}
# Child updates `runtime_env`
Actor.options(runtime_env=ray.get_current_runtime_env().update({"env_vars": {"A": "a", "B": "b"}}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this example! I think we should add a full entry for get_current_runtime_env in an API reference somewhere. It could be on this page, or in the API reference for https://docs.ray.io/en/latest/package-ref.html#runtime-context-apis, not sure which is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advise. I have added a API doc:
image

And add the link ref here:
image

@@ -745,7 +745,7 @@ def decorate_actor():
def test_init_requires_no_resources(call_ray_start, use_client):
import ray

if use_client:
if not use_client:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :)

@@ -277,6 +277,14 @@ def get_runtime_context(self):
"""
return ClientWorkerPropertyAPI(self.worker).build_runtime_context()

def get_current_runtime_env(self):
"""Get the runtime env of the current client/driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from the get_current_runtime_env in runtime_env.py? Does the other one return the runtime env of the current task/actor/job, while this one only returns the runtime env installed by the Ray client server (the job-level runtime env)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right!

Comment on lines 1427 to 1429
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "
<< parent_serialized_runtime_env << ". It will be overrode by "
<< serialized_runtime_env << ".";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "
<< parent_serialized_runtime_env << ". It will be overrode by "
<< serialized_runtime_env << ".";
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "
<< parent_serialized_runtime_env << ". It will be overridden by "
<< serialized_runtime_env << ".";

Why is this a WARNING? Isn't it expected normal behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this WARNING log because current PR already changed the API behavior. It is from the advise of Edward #21818 (comment). Maybe we can change this log level After a period of time? I can add a TODO here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense! Yeah, we'll need some way of remembering to change the log level in the future. A TODO sounds like a good solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when you merge this PR could you make a quick issue for this and add it to the runtime_env backlog milestone? I think that will help prevent it from being lost.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is pretty close to merge. Just a few comments about documentation .

@@ -349,27 +349,21 @@ Inheritance

The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence still applicable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actor/tasks with unspecified runtime envs will still inherit the env of their parent. Maybe we can clarify by changing "unless it is overriden." -> "unless it is overridden by explicitly specifying a runtime environment for the child task/actor."

@@ -349,27 +349,21 @@ Inheritance

The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden.

If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task) as follows:
If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the doc is confusing. Can we rewrite in this format?

By default, all actors and tasks inherit the parent's runtime_env.

-- here, show an example code --

However, if you specify runtime_env for task/actor, it will override the parents' runtime env

-- show an example --

If you'd like to still use parent's runtime environment, 

-- show an example --

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

image
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a lot easier to understand!

Btw, I don't like much about parent's runtime env. Is it possible to just show with ray.init as an example? Or do we explain the meaning of "parent" before this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ray.init and rename to "current runtime"

if _runtime_env is None:
_runtime_env = dict(ray.get_runtime_context().runtime_env)

return _runtime_env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this global thing? Can't we just return dict(ray.get_runtime_context().runtime_env)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a little optimization as runtime context. Current runtime env is immutable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the optimization? Can we just not do this? I want to avoid doing unnecessary optimization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

["ray start --head --ray-client-server-port 25553"],
indirect=True,
)
@pytest.mark.parametrize("use_client", [False, True])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests to verify the inheritance now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_e2e_complex can cover this.

std::string CoreWorker::OverrideTaskOrActorRuntimeEnv(
const std::string &serialized_runtime_env,
std::vector<std::string> *runtime_env_uris) {
std::shared_ptr<rpc::RuntimeEnv> parent_runtime_env;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a shared pointer? Can we just make it rpc::RuntimeEnv? Also, this API seems dangerous (worker_context_.GetCurrentRuntimeEnv()). The return value should be immutable, but now we can mutate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid object copy.
I can modify the return type to std::shared_ptr<const rpc::RuntimeEnv>, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified return type sounds good to me!

// TODO(SongGuyang): We add this warning log because of the change of API behavior.
// Refer to https://github.com/ray-project/ray/issues/21818.
// Modify this log level to `INFO` or `DEBUG` after a few release versions.
RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is not user-facing, so I am not sure if this will do anything... Maybe just removing it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by not user-facing? Do you mean it won't be streamed to the driver and will only show up in the log files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This log will just spam core log files and won’t be delivered to users in a clear way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! Is there a recommended way to stream a warning to the driver from C++?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use log level error to do that. But i feel like it is not a good idea cuz it can become extremely spammy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the WARNING log can't be streamed to driver in my test. Maybe we can keep this log for debug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug sounds good

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with merging the PR, but I'd like to avoid having global stuff.

@rkooo567
Copy link
Contributor

Lmk when it is ready to merge!

@SongGuyang
Copy link
Contributor Author

@rkooo567 Ready to merge!

@raulchen raulchen merged commit 5783cdb into ray-project:master Feb 21, 2022
@raulchen raulchen deleted the dev_runtime_env_inherit branch February 21, 2022 10:13
@edoakes
Copy link
Contributor

edoakes commented Feb 23, 2022

Hey @SongGuyang I just noticed this while reviewing the other PR, but why do we need to add a new get_current_runtime_env function instead of just using the get_runtime_context? I think we should remove this and only support getting it through runtime context to standardize the API.

@SongGuyang
Copy link
Contributor Author

@edoakes We will support a strong-typed API for runtime env and we will add new APIs to "ray.runtime_env" anyway. So, I want to unify all the APIs about runtime env to one code path. In addition, we also need to add a new API like get_current_runtime_env_class(not sure if it's the best name) for the Strong-typed one. This is my thought. But if you insist on using runtime context, I will move this.

xwjiang2010 added a commit to xwjiang2010/ray that referenced this pull request Feb 24, 2022
xwjiang2010 added a commit to xwjiang2010/ray that referenced this pull request Feb 24, 2022
xwjiang2010 added a commit to xwjiang2010/ray that referenced this pull request Feb 24, 2022
edoakes pushed a commit that referenced this pull request Feb 25, 2022
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**

Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
SongGuyang added a commit to alipay/ant-ray that referenced this pull request May 6, 2022
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**

Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
raulchen pushed a commit that referenced this pull request May 20, 2022
* [runtime env] runtime env inheritance refactor (#22244)

Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**

Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants