Plasma manager performance: speed up wait with a wait request object map #427

atumanov · 2017-04-04T05:34:26Z

No description provided.

AmplabJenkins · 2017-04-04T05:51:46Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/488/
Test PASSed.

robertnishihara · 2017-04-04T06:38:31Z

src/common/common.cc

+      (*reinterpret_cast<const uint32_t *>(&y.id[0])))
+      return false;
+  return UNIQUE_ID_EQ(x, y);
+}


This seems dangerous and could lead to false positives, right? Why not use the original version

bool operator==(UniqueID a, UniqueID b) { return UNIQUE_ID_EQ(a, b); }

Oh, I missed the if statement. If we profile this and it's faster then let's keep it this way, otherwise let's just use UNIQUE_ID_EQ(x, y).

the early return version appears marginally faster.

robertnishihara · 2017-04-04T06:40:52Z

src/common/common.cc

@@ -29,6 +30,23 @@ UniqueID globally_unique_id(void) {
  return result;
 }

+/* ObjectID hashing function. */
+size_t hashObjectID(const ObjectID &key) {


Now we have two separate different ID hashing functions (the other is defined in plasma_store.h), we should probably just define it once.

robertnishihara · 2017-04-04T06:44:18Z

src/plasma/plasma_manager.cc

@@ -158,7 +161,8 @@ typedef struct {
  /** The object requests for this wait request. Each object request has a
   *  status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE.
   */
-  ObjectRequest *object_requests;
+  std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>
+      *object_requests;


It wasn't necessary to pass in the third argument when we did this in the plasma store https://github.com/ray-project/ray/pull/324/files#diff-450748b4523710897a16506dabc79950R119, so perhaps it can be avoided here?

we address all this discussion by using a functor for the hash function.

robertnishihara · 2017-04-04T06:48:21Z

src/plasma/plasma_manager.cc

@@ -1137,11 +1139,10 @@ void process_wait_request(ClientConnection *client_conn,
  wait_req->timer = -1;
  wait_req->num_object_requests = num_object_requests;
  wait_req->object_requests =
-      (ObjectRequest *) malloc(num_object_requests * sizeof(ObjectRequest));
+      new std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>(


there's probably a way to get rid of the third argument here

robertnishihara · 2017-04-04T06:54:47Z

src/plasma/plasma_manager.cc

   * tables if it is present there. */
-  for (int i = 0; i < wait_req->num_object_requests; ++i) {
+  for (const auto &objreq_pair : *(wait_req->object_requests)) {


Let's expand the objreq in the variable names to object_request or something like that

pcmoritz · 2017-04-04T07:26:25Z

src/plasma/plasma_manager.cc

@@ -158,7 +161,8 @@ typedef struct {
  /** The object requests for this wait request. Each object request has a
   *  status field which is either PLASMA_QUERY_LOCAL or PLASMA_QUERY_ANYWHERE.
   */
-  ObjectRequest *object_requests;
+  std::unordered_map<ObjectID, ObjectRequest, decltype(&hashObjectID)>
+      *object_requests;


Let's make this a "std::unordered_map" instead of a "pointer to a std::unordered_map" (and replace the malloc for WaitRequest by a new); this should work because we are only ever storing WaitRequest* in uthash data structures (as opposed to a full WaitRequest).

good call, done (needed to add a ctor to the WaitRequest struct)

robertnishihara · 2017-04-04T07:27:15Z

I tried out the following experiment on two nodes (these happen to be m4.16xlarge).

Start one node with

./scripts/start_ray.sh --head --num-cpus=0 --redis-port=6379

Start another with

./scripts/start_ray.sh --redis-address=<head-node-ip>:6379 --num-cpus=100 --num-workers=100

Then on the first node, do

import ray
ray.init(redis_address="<head-node-ip>:6379")

@ray.remote
def f():
  return 1

%time l = ray.wait([f.remote() for _ in range(10 ** 5)], num_returns=(10 ** 5))

Before this PR, it printed

CPU times: user 2.46 s, sys: 444 ms, total: 2.91 s
Wall time: 1min 20s

After this PR, it printed

CPU times: user 2.13 s, sys: 548 ms, total: 2.68 s
Wall time: 8.58 s

wesm · 2017-04-04T20:28:59Z

src/common/common.h

@@ -152,6 +152,11 @@ UniqueID globally_unique_id(void);

 typedef UniqueID ObjectID;

+#ifdef __cplusplus
+size_t hashObjectID(const ObjectID &key);
+bool operator==(const ObjectID& x, const ObjectID& y);


Does inlining these in release builds have an impact?

Wes, I don't think it will have an effect (with O3 optimizations turned on), but we'll test it out.

As a quick test, I tried all combinations of inlining and the speed of ray.wait on 10^5 objects is unaffected (120ms on my laptop). We can analyze this more when we put together the benchmarks.

AmplabJenkins · 2017-04-04T21:46:52Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/494/
Test PASSed.

AmplabJenkins · 2017-04-07T08:46:27Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-04-07T08:46:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/518/
Test PASSed.

AmplabJenkins · 2017-04-07T09:18:14Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-04-07T09:18:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/519/
Test PASSed.

AmplabJenkins · 2017-04-07T09:45:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/520/
Test PASSed.

AmplabJenkins · 2017-04-07T15:10:41Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-04-07T15:10:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/521/
Test PASSed.

AmplabJenkins · 2017-04-07T15:29:57Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-04-07T15:29:57Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/522/
Test PASSed.

robertnishihara reviewed Apr 4, 2017

View reviewed changes

pcmoritz reviewed Apr 4, 2017

View reviewed changes

wesm reviewed Apr 4, 2017

View reviewed changes

atumanov added 8 commits April 6, 2017 20:01

plasma manager perf: speedup wait with a wait request object map

e7fc296

removing duplicate == operator in plasma store

46a284e

fix serialization test

9306139

code cleanup

84b5fe8

minor cleanup

6bcc6a6

factoring out uniqueid hash and equality operators into common

039ae61

plasma manager: c++ify the WaitRequest struct

e141b13

plasma manager: get rid of the initial object request malloc

441b3e2

atumanov force-pushed the plasma-manager-wait branch from 90de8e3 to 441b3e2 Compare April 7, 2017 08:26

cleanup

7ca492f

linting

1a61fd6

cleanups and fix compiler warnings

2489c7e

pcmoritz changed the title ~~plasma manager perf: speed up wait with a wait request object map~~ Plasma manager performance: speed up wait with a wait request object map Apr 7, 2017

compiler warnings and linting

765999a

pcmoritz merged commit 6f92254 into ray-project:master Apr 7, 2017

pcmoritz deleted the plasma-manager-wait branch April 7, 2017 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plasma manager performance: speed up wait with a wait request object map #427

Plasma manager performance: speed up wait with a wait request object map #427

atumanov commented Apr 4, 2017

AmplabJenkins commented Apr 4, 2017

robertnishihara Apr 4, 2017

robertnishihara Apr 4, 2017

atumanov Apr 7, 2017

robertnishihara Apr 4, 2017

robertnishihara Apr 4, 2017

atumanov Apr 7, 2017

robertnishihara Apr 4, 2017

robertnishihara Apr 4, 2017

pcmoritz Apr 4, 2017

atumanov Apr 7, 2017

robertnishihara commented Apr 4, 2017

wesm Apr 4, 2017

atumanov Apr 4, 2017

pcmoritz Apr 4, 2017 •

edited

Loading

AmplabJenkins commented Apr 4, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

Plasma manager performance: speed up wait with a wait request object map #427

Plasma manager performance: speed up wait with a wait request object map #427

Conversation

atumanov commented Apr 4, 2017

AmplabJenkins commented Apr 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertnishihara commented Apr 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz Apr 4, 2017 • edited Loading

Choose a reason for hiding this comment

AmplabJenkins commented Apr 4, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

pcmoritz Apr 4, 2017 •

edited

Loading