Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Simplify Java worker configuration #2938

Merged
merged 10 commits into from
Sep 26, 2018

Conversation

jovany-wang
Copy link
Contributor

@jovany-wang jovany-wang commented Sep 25, 2018

What do these changes do?

Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments.

This PR aims to simplify Java worker's configuration.

  1. Configuration management is now migrated to lightbend config, thus doesn't require setting environment variables.
  2. Many unused config items are removed.
  3. Provide a simple example.conf file, so users can get started quickly.
  4. All possible options and their default values are declared and documented in ray.default.conf file.

This PR also simplifies and refines the following code:

  1. The process of Ray.init().
  2. RunManager.
  3. WorkerContext.

How to use this configuration?

  1. Copy example.conf into your classpath and rename it to ray.conf.
  2. Modify/add your configuration items. The all items are declared in ray.default.conf.
  3. You can also set the items in java system prosperities.

Note: configuration is read in this priority:
System properties > ray.conf > ray.default.conf

Related issue number

N/A

@jovany-wang
Copy link
Contributor Author

@chuxi cc

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8338/
Test PASSed.

// Redis server, Raylet and object store.
address: ""
// If `redis.server` isn't provided, which port we should use to start redis server.
head-port: 32100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 6379 since that is standard?


// Path of your Ray home dir,
// can be either an absolute path, or a relative path to current work dir.
home = "/path/to/your/ray/home"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the ray/ directory where Ray is installed? E.g., if I clone the Ray repository to /home/ubuntu/ray and then build it from source in place, then this should be /home/ubuntu/ray?

"current working directory" sounds like something else, e.g., that sounds more like where my application code lives. So maybe this comment can be clarified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the ray/ directory where Ray is installed?

yes.

"current working directory" sounds like something else, e.g., that sounds more like where my application code lives. So maybe this comment can be clarified?

The motivation of allowing 'relative path to current work dir' is the following: when we run java tests under ray/java/test folder, ray home will be ../.. of current work dir, see TestListener.java. If we only allow absolute path, everyone needs to change it before running tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then can we rephrase the comment as

// This is the path to the directory where Ray is installed, e.g.,
// something like /home/ubuntu/ray. This can be an absolute path
// or a relative path from the current working directory.

To clarify, is it a relative path from the directory that the user is in? Or it a relative path from where the application lives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the relative path from where the program is started, i.e., result of pwd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jovan-wong could you update the comment according to @robertnishihara's suggestion?

// Worker's mode, available options are:
// `WORKER`: indicates that this worker is a normal worker.
// `DRIVER`: indicates that this worker is a driver.
worker.mode = DRIVER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances should this be set to WORKER or DRIVER?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKER should only be set in worker process (the process forked from worker_pool.cc). Actually I think this shouldn't be exposed to user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this. To run an application, does the user need to make one conf file for the cluster, one for the application, or one for every worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think we should add an API to specify the worker mode:
For a normal worker, we can invoke this:

Ray.init(WorkerMode.WORKER);

And call ray.init() without argument for a driver.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what Ray.init(WorkerMode.WORKER); means. Under what circumstances would someone do that? Just for testing/debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should call Ray.init() instead of Ray.init(WorkerMode.WORKER).
Because Ray.init() will specify the worker mode as DRIVER.
Ray.init(WorkerMode.WORKER) just for the DefaultWorker.java.

And then, we can remove the worker mode item from config file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this. To run an application, does the user need to make one conf file for the cluster, one for the application, or one for every worker?

there should be only one file for the application. And worker processes should use the same file, except overriding ray.worker.mode=WORKER (see DefaultWorker.java).

Thus, users always call Ray.init(), and don't need to worry about which mode they should set.

// `SINGLE_PROCESS`: Ray will be running in a single Java process.
// `SINGLE_BOX`: All ray processes will be running on one node.
// `CLUSTER`: Ray will be running in a cluster.
run-mode = SINGLE_BOX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, is the conf file for the "cluster" or do you need one conf file for each application?

Why do you need to specify SINGLE_BOX versus CLUSTER?

Copy link
Contributor

@raulchen raulchen Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this file defines the default values, each application should provide their own file to fit their scenario.
  2. this is due to a legacy reason. Because SINGLE_BOX and CLUSTER use different mechanism to manage functions. Once we finish [java] put function meta in task spec and load functions with function meta #2881, SINGLE_BOX can be removed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8339/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8340/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8342/
Test FAILed.

@raulchen raulchen changed the title [Java] Refactor RayConfig and improve Java worker. [Java] Simplify configuration usage of Java worker Sep 25, 2018
@raulchen raulchen changed the title [Java] Simplify configuration usage of Java worker [Java] Simplify Java worker configuration Sep 25, 2018
# For config file format, see 'https://github.com/lightbend/config/blob/master/HOCON.md'.

ray {
// This is the path to the directory where Ray in installed, e.g.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: in -> is

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8347/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8348/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8351/
Test PASSed.

@@ -13,7 +13,6 @@ public void testCreateRayConfig() {
System.setProperty("ray.home", "/path/to/ray");
RayConfig rayConfig = RayConfig.create();

Assert.assertEquals(rayConfig.rayHome, "/path/to/ray");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert can't pass in my machine.
It always be the path setted here.

Maybe we can remove this lineSystem.setProperty("ray.home", "/path/to/ray");,
and rephrase the assert toassert(rayConfig.rayHome, "../..");

@@ -37,7 +37,7 @@ public Integer echo(Integer number) {
}
}

@RayRemote(resources = {@ResourceItem(name = "CPU", value = 8),
@RayRemote(resources = {@ResourceItem(name = "CPU", value = 1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In different machines, the number of CPUs is not determined, We should make sure that the machine doesn't have enough CPUs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, we should set a FIXED number for this test.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8354/
Test PASSed.

@robertnishihara
Copy link
Collaborator

I think I introduced conflicts by merging #2881.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8357/
Test PASSed.

jovany-wang and others added 7 commits September 26, 2018 18:14
Remove useless fields in config file.

Remove useless fields in PathConfig.

Remove all fields and methods in PathConfig.

Remove PathConfig.

Remove all fields of config file.

Remove Parameters.

Add overwrite field.

Pass iniconfig.

Make DefaultWorker using RayAPI to init itself.

Remove useless code.

Clean code.

Remove TODO.

Change some thing

Add RayConfig.

Read config files.

Change a little.

Add ray.home directory.

Update WorkerMode.

Update RunMode.

Update node ip.

Update redis address configuration item.

Update object store name configuration item.

Update object store name index configuration item.

Update some configuration items.

Remove useless params.

Remove RayParameters and ConfigReader classes.

Remove Aconfig and refactor init

Rename a func.

Remove useless files.

Clean code.

Clean code.

[VERY IMPORTANT] Change module and package dependencies.

In this commit, I move some dependencies of RayConfig into api module
from runtime module.

Remove RayInitConfig.

Finish TODOs and address comments.

Fix lint.

Fix lint.

Move DefaultRayRuntimeFactory into runtime module.

Refactor WorkerContext
In this commit, I removed all the static variables and methods and make
it non-static.

Fix lint.

Address comments.

refine RunManager

fix

fix logger

fix test

refine

update doc and set library at runtime

Fix lint:Add a new line.

Update README.rst

Update the format.

Fix: code style.

Address comments.

Fix Worker Mode.

Update ray.conf

Fix typo.

Fix lint.

Update resource management test.
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8363/
Test FAILed.

@raulchen
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8364/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8362/
Test FAILed.

@raulchen
Copy link
Contributor

jenkins, retest this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8366/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8368/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8367/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8369/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8372/
Test PASSed.

@raulchen raulchen merged commit 8e8e123 into ray-project:master Sep 26, 2018
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/8370/
Test PASSed.

@raulchen raulchen deleted the simplify_config branch September 26, 2018 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants