-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plasma manager performance: speed up wait with a wait request object map #427
Conversation
Test PASSed. |
src/common/common.cc
Outdated
(*reinterpret_cast<const uint32_t *>(&y.id[0]))) | ||
return false; | ||
return UNIQUE_ID_EQ(x, y); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems dangerous and could lead to false positives, right? Why not use the original version
bool operator==(UniqueID a, UniqueID b) {
return UNIQUE_ID_EQ(a, b);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed the if
statement. If we profile this and it's faster then let's keep it this way, otherwise let's just use UNIQUE_ID_EQ(x, y)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the early return version appears marginally faster.
src/common/common.cc
Outdated
@@ -29,6 +30,23 @@ UniqueID globally_unique_id(void) { | |||
return result; | |||
} | |||
|
|||
/* ObjectID hashing function. */ | |||
size_t hashObjectID(const ObjectID &key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have two separate different ID hashing functions (the other is defined in plasma_store.h
), we should probably just define it once.
src/plasma/plasma_manager.cc
Outdated
@@ -158,7 +161,8 @@ typedef struct { | |||
/** The object requests for this wait request. Each object request has a | |||
* status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE. | |||
*/ | |||
ObjectRequest *object_requests; | |||
std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)> | |||
*object_requests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't necessary to pass in the third argument when we did this in the plasma store https://github.com/ray-project/ray/pull/324/files#diff-450748b4523710897a16506dabc79950R119, so perhaps it can be avoided here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we address all this discussion by using a functor for the hash function.
src/plasma/plasma_manager.cc
Outdated
@@ -1137,11 +1139,10 @@ void process_wait_request(ClientConnection *client_conn, | |||
wait_req->timer = -1; | |||
wait_req->num_object_requests = num_object_requests; | |||
wait_req->object_requests = | |||
(ObjectRequest *) malloc(num_object_requests * sizeof(ObjectRequest)); | |||
new std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's probably a way to get rid of the third argument here
src/plasma/plasma_manager.cc
Outdated
* tables if it is present there. */ | ||
for (int i = 0; i < wait_req->num_object_requests; ++i) { | ||
for (const auto &objreq_pair : *(wait_req->object_requests)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's expand the objreq
in the variable names to object_request
or something like that
src/plasma/plasma_manager.cc
Outdated
@@ -158,7 +161,8 @@ typedef struct { | |||
/** The object requests for this wait request. Each object request has a | |||
* status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE. | |||
*/ | |||
ObjectRequest *object_requests; | |||
std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)> | |||
*object_requests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a "std::unordered_map" instead of a "pointer to a std::unordered_map" (and replace the malloc for WaitRequest by a new); this should work because we are only ever storing WaitRequest* in uthash data structures (as opposed to a full WaitRequest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, done (needed to add a ctor to the WaitRequest struct)
I tried out the following experiment on two nodes (these happen to be m4.16xlarge). Start one node with ./scripts/start_ray.sh --head --num-cpus=0 --redis-port=6379 Start another with ./scripts/start_ray.sh --redis-address=<head-node-ip>:6379 --num-cpus=100 --num-workers=100 Then on the first node, do import ray
ray.init(redis_address="<head-node-ip>:6379")
@ray.remote
def f():
return 1
%time l = ray.wait([f.remote() for _ in range(10 ** 5)], num_returns=(10 ** 5)) Before this PR, it printed
After this PR, it printed
|
src/common/common.h
Outdated
@@ -152,6 +152,11 @@ UniqueID globally_unique_id(void); | |||
|
|||
typedef UniqueID ObjectID; | |||
|
|||
#ifdef __cplusplus | |||
size_t hashObjectID(const ObjectID &key); | |||
bool operator==(const ObjectID& x, const ObjectID& y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does inlining these in release builds have an impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wes, I don't think it will have an effect (with O3 optimizations turned on), but we'll test it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick test, I tried all combinations of inlining and the speed of ray.wait
on 10^5 objects is unaffected (120ms on my laptop). We can analyze this more when we put together the benchmarks.
Test PASSed. |
90de8e3
to
441b3e2
Compare
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
No description provided.