-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Simplify Java worker configuration #2938
Conversation
@chuxi cc |
Test PASSed. |
// Redis server, Raylet and object store. | ||
address: "" | ||
// If `redis.server` isn't provided, which port we should use to start redis server. | ||
head-port: 32100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 6379 since that is standard?
|
||
// Path of your Ray home dir, | ||
// can be either an absolute path, or a relative path to current work dir. | ||
home = "/path/to/your/ray/home" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the ray/
directory where Ray is installed? E.g., if I clone the Ray repository to /home/ubuntu/ray
and then build it from source in place, then this should be /home/ubuntu/ray
?
"current working directory" sounds like something else, e.g., that sounds more like where my application code lives. So maybe this comment can be clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the ray/ directory where Ray is installed?
yes.
"current working directory" sounds like something else, e.g., that sounds more like where my application code lives. So maybe this comment can be clarified?
The motivation of allowing 'relative path to current work dir' is the following: when we run java tests under ray/java/test
folder, ray home will be ../..
of current work dir, see TestListener.java
. If we only allow absolute path, everyone needs to change it before running tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then can we rephrase the comment as
// This is the path to the directory where Ray is installed, e.g.,
// something like /home/ubuntu/ray. This can be an absolute path
// or a relative path from the current working directory.
To clarify, is it a relative path from the directory that the user is in? Or it a relative path from where the application lives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the relative path from where the program is started, i.e., result of pwd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jovan-wong could you update the comment according to @robertnishihara's suggestion?
java/example.conf
Outdated
// Worker's mode, available options are: | ||
// `WORKER`: indicates that this worker is a normal worker. | ||
// `DRIVER`: indicates that this worker is a driver. | ||
worker.mode = DRIVER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what circumstances should this be set to WORKER
or DRIVER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WORKER
should only be set in worker process (the process forked from worker_pool.cc). Actually I think this shouldn't be exposed to user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this. To run an application, does the user need to make one conf file for the cluster, one for the application, or one for every worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I think we should add an API to specify the worker mode:
For a normal worker, we can invoke this:
Ray.init(WorkerMode.WORKER);
And call ray.init()
without argument for a driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me what Ray.init(WorkerMode.WORKER);
means. Under what circumstances would someone do that? Just for testing/debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User should call Ray.init()
instead of Ray.init(WorkerMode.WORKER)
.
Because Ray.init()
will specify the worker mode as DRIVER.
Ray.init(WorkerMode.WORKER)
just for the DefaultWorker.java
.
And then, we can remove the worker mode item from config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this. To run an application, does the user need to make one conf file for the cluster, one for the application, or one for every worker?
there should be only one file for the application. And worker processes should use the same file, except overriding ray.worker.mode=WORKER
(see DefaultWorker.java
).
Thus, users always call Ray.init()
, and don't need to worry about which mode they should set.
java/example.conf
Outdated
// `SINGLE_PROCESS`: Ray will be running in a single Java process. | ||
// `SINGLE_BOX`: All ray processes will be running on one node. | ||
// `CLUSTER`: Ray will be running in a cluster. | ||
run-mode = SINGLE_BOX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, is the conf
file for the "cluster" or do you need one conf
file for each application?
Why do you need to specify SINGLE_BOX
versus CLUSTER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this file defines the default values, each application should provide their own file to fit their scenario.
- this is due to a legacy reason. Because
SINGLE_BOX
andCLUSTER
use different mechanism to manage functions. Once we finish [java] put function meta in task spec and load functions with function meta #2881,SINGLE_BOX
can be removed.
Test PASSed. |
Test FAILed. |
Test FAILed. |
java/example.conf
Outdated
# For config file format, see 'https://github.com/lightbend/config/blob/master/HOCON.md'. | ||
|
||
ray { | ||
// This is the path to the directory where Ray in installed, e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: in
-> is
Test PASSed. |
Test PASSed. |
Test PASSed. |
@@ -13,7 +13,6 @@ public void testCreateRayConfig() { | |||
System.setProperty("ray.home", "/path/to/ray"); | |||
RayConfig rayConfig = RayConfig.create(); | |||
|
|||
Assert.assertEquals(rayConfig.rayHome, "/path/to/ray"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert
can't pass in my machine.
It always be the path setted here.
Maybe we can remove this lineSystem.setProperty("ray.home", "/path/to/ray");
,
and rephrase the assert toassert(rayConfig.rayHome, "../..")
;
@@ -37,7 +37,7 @@ public Integer echo(Integer number) { | |||
} | |||
} | |||
|
|||
@RayRemote(resources = {@ResourceItem(name = "CPU", value = 8), | |||
@RayRemote(resources = {@ResourceItem(name = "CPU", value = 1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In different machines, the number of CPUs is not determined, We should make sure that the machine doesn't have enough CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, we should set a FIXED number for this test.
Test PASSed. |
I think I introduced conflicts by merging #2881. |
Test PASSed. |
191cda4
to
241cf83
Compare
Remove useless fields in config file. Remove useless fields in PathConfig. Remove all fields and methods in PathConfig. Remove PathConfig. Remove all fields of config file. Remove Parameters. Add overwrite field. Pass iniconfig. Make DefaultWorker using RayAPI to init itself. Remove useless code. Clean code. Remove TODO. Change some thing Add RayConfig. Read config files. Change a little. Add ray.home directory. Update WorkerMode. Update RunMode. Update node ip. Update redis address configuration item. Update object store name configuration item. Update object store name index configuration item. Update some configuration items. Remove useless params. Remove RayParameters and ConfigReader classes. Remove Aconfig and refactor init Rename a func. Remove useless files. Clean code. Clean code. [VERY IMPORTANT] Change module and package dependencies. In this commit, I move some dependencies of RayConfig into api module from runtime module. Remove RayInitConfig. Finish TODOs and address comments. Fix lint. Fix lint. Move DefaultRayRuntimeFactory into runtime module. Refactor WorkerContext In this commit, I removed all the static variables and methods and make it non-static. Fix lint. Address comments. refine RunManager fix fix logger fix test refine update doc and set library at runtime Fix lint:Add a new line. Update README.rst Update the format. Fix: code style. Address comments. Fix Worker Mode. Update ray.conf Fix typo. Fix lint. Update resource management test.
bd4645a
to
e06755f
Compare
Test FAILed. |
Jenkins, retest this please. |
Test FAILed. |
Test FAILed. |
jenkins, retest this please |
Test PASSed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
What do these changes do?
Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments.
This PR aims to simplify Java worker's configuration.
example.conf
file, so users can get started quickly.ray.default.conf
file.This PR also simplifies and refines the following code:
Ray.init()
.RunManager
.WorkerContext
.How to use this configuration?
example.conf
into your classpath and rename it toray.conf
.ray.default.conf
.Note: configuration is read in this priority:
System properties >
ray.conf
>ray.default.conf
Related issue number
N/A