Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plasma manager performance: speed up wait with a wait request object map #427

Merged
merged 12 commits into from
Apr 7, 2017

Conversation

atumanov
Copy link
Contributor

@atumanov atumanov commented Apr 4, 2017

No description provided.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/488/
Test PASSed.

(*reinterpret_cast<const uint32_t *>(&y.id[0])))
return false;
return UNIQUE_ID_EQ(x, y);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous and could lead to false positives, right? Why not use the original version

bool operator==(UniqueID a, UniqueID b) {		
  return UNIQUE_ID_EQ(a, b);		
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the if statement. If we profile this and it's faster then let's keep it this way, otherwise let's just use UNIQUE_ID_EQ(x, y).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the early return version appears marginally faster.

@@ -29,6 +30,23 @@ UniqueID globally_unique_id(void) {
return result;
}

/* ObjectID hashing function. */
size_t hashObjectID(const ObjectID &key) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have two separate different ID hashing functions (the other is defined in plasma_store.h), we should probably just define it once.

@@ -158,7 +161,8 @@ typedef struct {
/** The object requests for this wait request. Each object request has a
* status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE.
*/
ObjectRequest *object_requests;
std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>
*object_requests;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't necessary to pass in the third argument when we did this in the plasma store https://github.com/ray-project/ray/pull/324/files#diff-450748b4523710897a16506dabc79950R119, so perhaps it can be avoided here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we address all this discussion by using a functor for the hash function.

@@ -1137,11 +1139,10 @@ void process_wait_request(ClientConnection *client_conn,
wait_req->timer = -1;
wait_req->num_object_requests = num_object_requests;
wait_req->object_requests =
(ObjectRequest *) malloc(num_object_requests * sizeof(ObjectRequest));
new std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably a way to get rid of the third argument here

* tables if it is present there. */
for (int i = 0; i < wait_req->num_object_requests; ++i) {
for (const auto &objreq_pair : *(wait_req->object_requests)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's expand the objreq in the variable names to object_request or something like that

@@ -158,7 +161,8 @@ typedef struct {
/** The object requests for this wait request. Each object request has a
* status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE.
*/
ObjectRequest *object_requests;
std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>
*object_requests;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a "std::unordered_map" instead of a "pointer to a std::unordered_map" (and replace the malloc for WaitRequest by a new); this should work because we are only ever storing WaitRequest* in uthash data structures (as opposed to a full WaitRequest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done (needed to add a ctor to the WaitRequest struct)

@robertnishihara
Copy link
Collaborator

I tried out the following experiment on two nodes (these happen to be m4.16xlarge).

Start one node with

./scripts/start_ray.sh --head --num-cpus=0 --redis-port=6379

Start another with

./scripts/start_ray.sh --redis-address=<head-node-ip>:6379 --num-cpus=100 --num-workers=100

Then on the first node, do

import ray
ray.init(redis_address="<head-node-ip>:6379")

@ray.remote
def f():
  return 1

%time l = ray.wait([f.remote() for _ in range(10 ** 5)], num_returns=(10 ** 5))

Before this PR, it printed

CPU times: user 2.46 s, sys: 444 ms, total: 2.91 s
Wall time: 1min 20s

After this PR, it printed

CPU times: user 2.13 s, sys: 548 ms, total: 2.68 s
Wall time: 8.58 s

@@ -152,6 +152,11 @@ UniqueID globally_unique_id(void);

typedef UniqueID ObjectID;

#ifdef __cplusplus
size_t hashObjectID(const ObjectID &key);
bool operator==(const ObjectID& x, const ObjectID& y);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does inlining these in release builds have an impact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wes, I don't think it will have an effect (with O3 optimizations turned on), but we'll test it out.

Copy link
Contributor

@pcmoritz pcmoritz Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick test, I tried all combinations of inlining and the speed of ray.wait on 10^5 objects is unaffected (120ms on my laptop). We can analyze this more when we put together the benchmarks.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/494/
Test PASSed.

@atumanov atumanov force-pushed the plasma-manager-wait branch from 90de8e3 to 441b3e2 Compare April 7, 2017 08:26
@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/518/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/519/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/520/
Test PASSed.

@pcmoritz pcmoritz changed the title plasma manager perf: speed up wait with a wait request object map Plasma manager performance: speed up wait with a wait request object map Apr 7, 2017
@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/521/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/522/
Test PASSed.

@pcmoritz pcmoritz merged commit 6f92254 into ray-project:master Apr 7, 2017
@pcmoritz pcmoritz deleted the plasma-manager-wait branch April 7, 2017 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants