Convert actor dummy objects to task execution edges. #1281

stephanie-wang · 2017-12-01T00:03:09Z

This PR converts the actor "dummy" objects, which were used to indicate actor execution order rather than data dependencies explicitly specified by the user, into a separate, mutable field in the task spec.

What do these changes do?

This replaces all instances of the TaskSpec with the new TaskExecutionSpec, which comprises the immutable TaskSpec and an additional vector of execution dependencies. Execution dependencies are currently only used by actors, to determine what tasks should have executed on the actor before the new task can be scheduled. Currently, these execution dependencies are never changed in the GCS task table, but in the future, they may be used for deterministic actor replay by recording the exact order of initial execution.

AmplabJenkins · 2017-12-01T00:16:12Z

Build finished. Test FAILed.

AmplabJenkins · 2017-12-01T00:16:13Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2628/
Test FAILed.

AmplabJenkins · 2017-12-01T02:55:00Z

Build finished. Test PASSed.

AmplabJenkins · 2017-12-01T02:55:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2629/
Test PASSed.

robertnishihara

Leaving a couple comments. I'm only part way through the PR.

robertnishihara · 2017-12-01T20:00:10Z

src/common/common_protocol.cc

@@ -13,6 +13,20 @@ ObjectID from_flatbuf(const flatbuffers::String *string) {
  return object_id;
 }

+std::vector<ObjectID> from_flatbuf(


should the return be const?

robertnishihara · 2017-12-01T20:00:25Z

src/common/common_protocol.cc

@@ -13,6 +13,20 @@ ObjectID from_flatbuf(const flatbuffers::String *string) {
  return object_id;
 }

+std::vector<ObjectID> from_flatbuf(
+    const flatbuffers::Vector<flatbuffers::Offset<flatbuffers::String>>
+        *vector) {


can we make this a ref instead of pointer? In #1236 (which unfortunately introduces a bunch of conflicts) I got rid of the pointers in this file.

robertnishihara · 2017-12-01T20:07:39Z

src/common/lib/python/common_extension.cc

+  size = PyList_Size(execution_arguments);
+  for (Py_ssize_t i = 0; i < size; ++i) {
+    PyObject *execution_arg = PyList_GetItem(execution_arguments, i);
+    CHECK(PyObject_IsInstance(execution_arg, (PyObject *) &PyObjectIDType));


We haven't been super consistent about this, but this should probably raise a TypeError instead of dying.

robertnishihara · 2017-12-01T20:13:34Z

src/common/lib/python/common_extension.h

@@ -2,6 +2,7 @@
 #define COMMON_EXTENSION_H

 #include <Python.h>
+#include <vector>


We should follow the Google C++ style guide here for include orders.

https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes

robertnishihara · 2017-12-01T20:19:41Z

src/common/task.cc

@@ -289,6 +289,17 @@ int64_t TaskSpec_num_args(TaskSpec *spec) {
  return message->args()->size();
 }

+int64_t TaskSpec_num_args_by_ref(TaskSpec *spec) {


is this used anywhere?

robertnishihara · 2017-12-01T20:24:33Z

src/common/task.cc

+
+TaskExecutionSpec *TaskExecutionSpec_alloc(std::vector<ObjectID> execution_dependencies, TaskSpec *spec, int64_t task_spec_size) {
+  int64_t size = sizeof(TaskExecutionSpec) - sizeof(TaskSpec) + task_spec_size;
+  TaskExecutionSpec *copy = (TaskExecutionSpec *) malloc(size);


Shouldn't we use new/delete instead of malloc/free? Especially given that this has a std::vector inside which presumably needs to have its constructor called?

Yeah, I wanted this too, but ended up just copying the current Task_alloc to deal with the variable-sized TaskSpec. I think we can fix this with a unique_ptr.

robertnishihara · 2017-12-01T22:47:18Z

src/common/task.cc

+      TaskExecutionSpec_task_spec_size(spec));
+}
+
+TaskExecutionSpec *TaskExecutionSpec_alloc(std::vector<ObjectID> execution_dependencies, TaskSpec *spec, int64_t task_spec_size) {


pass execution_dependenciesas const ref

robertnishihara · 2017-12-01T22:53:26Z

src/common/task.cc

+int64_t TaskExecutionSpec_task_spec_size(TaskExecutionSpec *spec) {
+  return spec->task_spec_size;
+}
+TaskSpec *TaskExecutionSpec_task_spec(TaskExecutionSpec *spec) {


would probably be good to be consistent about using execution_spec as the variable name for TaskExecutionSpecs

robertnishihara · 2017-12-01T22:57:02Z

src/common/task.cc

 }

 /* TASK INSTANCES */

 Task *Task_alloc(TaskSpec *spec,
                 int64_t task_spec_size,
                 int state,
-                 DBClientID local_scheduler_id) {
+                 DBClientID local_scheduler_id,
+                 std::vector<ObjectID> execution_dependencies) {


AmplabJenkins · 2017-12-01T23:54:52Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-01T23:54:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2634/
Test FAILed.

AmplabJenkins · 2017-12-07T03:24:19Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-07T03:24:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2661/
Test FAILed.

AmplabJenkins · 2017-12-07T21:49:18Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-07T21:49:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2665/
Test FAILed.

AmplabJenkins · 2017-12-08T02:46:23Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-08T02:46:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2682/
Test PASSed.

… argument

…cies

AmplabJenkins · 2017-12-08T21:54:18Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-13T19:19:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2763/
Test FAILed.

AmplabJenkins · 2017-12-13T20:19:49Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-13T20:19:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2765/
Test PASSed.

AmplabJenkins · 2017-12-14T01:59:54Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-14T01:59:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2777/
Test PASSed.

robertnishihara · 2017-12-14T02:12:03Z

src/common/test/example_task.h

-  return TaskSpec_finish_construct(g_task_builder, task_spec_size);
+  int64_t task_spec_size;
+  TaskSpec *spec = TaskSpec_finish_construct(g_task_builder, &task_spec_size);
+  std::vector<ObjectID> execution_dependencies = std::vector<ObjectID>();


This is unnecessarily verbose i think. I think you can just do

std::vector<ObjectID> execution_dependencies;

robertnishihara · 2017-12-14T02:23:48Z

src/common/lib/python/common_extension.cc

@@ -104,6 +108,8 @@ PyObject *PyTask_from_string(PyObject *self, PyObject *args) {
  result = (PyTask *) PyObject_Init((PyObject *) result, &PyTaskType);
  result->size = size;
  result->spec = TaskSpec_copy((TaskSpec *) data, size);
+  /* The created task does not include any execution dependencies. */
+  result->execution_dependencies = empty_execution_dependencies;


Interesting, so in this case we'd probably need to store ptr as a field inside of the PyTask struct, which is pretty weird, but maybe necessary. Another workaround would be to just have the PyTask contain a std::vector<ObjectID> * instead of std::vector<ObjectID>. Not sure if there would be a performance hit there.

AmplabJenkins · 2017-12-14T02:30:50Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-14T02:30:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2782/
Test PASSed.

robertnishihara · 2017-12-14T02:31:38Z

src/common/task.h

+  ///
+  /// @return A vector of object IDs representing this task's execution
+  ///         dependencies.
+  std::vector<ObjectID> ExecutionDependencies();


You've obviously given this more thought, but it seems cleaner to me to expose two methods ObjectIDDependencies and ExecutionDependencies.

There are some places in the local scheduler where we would have to iterate over both lists (e.g., when issuing reconstruction commands).

However, there are some places where we only want to iterate over a single list, such as when issuing fetch requests).

@pcmoritz thoughts about this?

E.g., in the future it could make sense to change the dummy object ID implementation and have execution dependencies actually be a vector of TaskIDs instead of ObjectIDs.

Hmm, I'm not convinced that it would be cleaner to have a vector of TaskIDs instead of ObjectIDs. It would be nice to keep reconstruction and fetching on relatively the same path. If you're worried about the latency of having to go to the result table, I think it makes more sense to cache information for actor tasks at the local scheduler.

I agree it would be cleaner to separate ObjectIDDependencies from ExecutionDependencies. Probably also having TaskIDs instead of ObjectIDs for the latter but we can think more about that and do it in a followup PR.

mehrdadn · 2017-12-14T02:47:20Z

src/common/task.cc

+  execution_dependencies_ = execution_dependencies;
+  task_spec_size_ = task_spec_size;
+  TaskSpec *spec_copy = new TaskSpec[task_spec_size_];
+  memcpy(spec_copy, spec, task_spec_size);


By the way, I would suggest switching to std::copy or std::copy_n, since that's the way to copy memory in C++. This code will break if at any time the data you are copying becomes nontrivial to copy (such as if the data type gains any members with constructors or destructors).

AmplabJenkins · 2017-12-14T05:28:04Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-14T05:28:05Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2790/
Test PASSed.

pcmoritz · 2017-12-14T06:09:54Z

src/common/common_protocol.cc

+    auto string = vector.Get(i);
+    CHECK(string->size() == sizeof(object_id.id));
+    memcpy(&object_id.id[0], string->data(), sizeof(object_id.id));
+    object_ids.push_back(object_id);


can we just do object_ids.push_back(from_flatbuf(vector.Get(i)) here?

pcmoritz · 2017-12-14T07:10:26Z

src/common/format/common.fbs

@@ -89,6 +97,8 @@ table TaskReply {
  state: long;
  // A local scheduler ID.
  local_scheduler_id: string;
+  // A string of bytes representing the task's TaskExecutionDependencies.
+  execution_dependencies: string;


wouldn't it be cleaner to store the list of strings here directly instead of wrapping and serializing them? Any reason this is not possible?

AmplabJenkins · 2017-12-15T00:05:10Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-15T00:05:11Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2796/
Test PASSed.

robertnishihara · 2017-12-15T04:59:13Z

There are a couple remaining cleanups, but we can fix those in follow up PRs.

atumanov · 2017-12-20T08:09:02Z

src/common/task.cc

+}
+
+std::vector<ObjectID> TaskExecutionSpec::ExecutionDependencies() {
+  return execution_dependencies_;


I believe this makes a copy of a vector of ObjectIDs each time you call ExecutionDependencies(). For readonly callers, why not return a const ref? Callers who need their own copy of the vector can make one.

atumanov · 2017-12-20T08:31:12Z

src/common/task.cc

@@ -359,45 +456,60 @@ bool TaskSpec_is_dependent_on(TaskSpec *spec, ObjectID object_id) {
      }
    }
  }
+  // Iterate through the execution dependencies to see if it contains object_id.
+  for (auto dependency_id : execution_dependencies_) {


this makes a copy of each ObjectID element of the execution dependencies vector. Why not

for (const auto &dep_id : exe_deps) {

ObjectID_equal is a readonly consumer (which, incidentally, we should be enforcing in the function prototype -- I can create a PR for that separately).

a side note (less important for now), if we ever envision large numbers of execution dependencies, we should probably make it a hashmap. We had to eliminate linear searches in the past for performance reasons.

atumanov · 2017-12-20T08:40:55Z

src/common/task.cc

+  }
+}
+
+bool TaskExecutionSpec::DependsOn(ObjectID object_id) {


another instance where we should really be passing a const ref to the object id. Does a few useful things:

establishes a contract with the caller that the object_id is not going to be mutated

eliminates a 20 byte copy when calling this function

atumanov · 2017-12-20T08:45:08Z

src/common/task.cc

@@ -359,45 +456,60 @@ bool TaskSpec_is_dependent_on(TaskSpec *spec, ObjectID object_id) {
      }
    }
  }
+  // Iterate through the execution dependencies to see if it contains object_id.
+  for (auto dependency_id : execution_dependencies_) {
+    if (ObjectID_equal(dependency_id, object_id)) {


we have operator== overloaded for object ID equality testing. I think we should deprecate the use of ObjectID_equal (with pass by value ObjectIDs) in favor of the equality operator. Perhaps all new code should be using the new C++ equality operator for object equality testing.

atumanov · 2017-12-20T10:04:16Z

src/common/state/redis.cc

      task_id.id, sizeof(task_id.id), state, local_scheduler_id.id,
-      sizeof(local_scheduler_id.id), spec, Task_task_spec_size(task));
+      sizeof(local_scheduler_id.id), fbb.GetBufferPointer(),
+      (size_t) fbb.GetSize(), spec, execution_spec->SpecSize());


Why not send the whole TaskExecutionSpec, replacing TaskSpec and its size? That way, any expansion of the TaskExecutionSpec class doesn't/won't affect this redis logic (provided we generate the flatbuffers serialization logic for it). It seems odd that only one field of the TaskExecutionSpec is extracted and flatbuffered. For any new fields (like spillback) we need to update this redis function and its callback counterpart.

That would make it more expensive to implement task_table_update, where we don't want to rewrite the whole TaskSpec.

stephanie-wang force-pushed the actor-nondeterminism branch from aae6bf0 to 4b62ebe Compare December 1, 2017 23:32

robertnishihara reviewed Dec 1, 2017

View reviewed changes

stephanie-wang force-pushed the actor-nondeterminism branch from 4b62ebe to 36b98da Compare December 7, 2017 01:19

stephanie-wang changed the title ~~[WIP] Convert actor dummy objects to task execution edges.~~ Convert actor dummy objects to task execution edges. Dec 7, 2017

stephanie-wang force-pushed the actor-nondeterminism branch from b7b6ab2 to 42bf669 Compare December 8, 2017 02:03

stephanie-wang added 11 commits December 8, 2017 11:49

Define execution dependencies flatbuffer and add to Redis commands

c882f7c

Convert TaskSpec to TaskExecutionSpec

7891437

Add execution dependencies to Python bindings

dbf68dd

Submitting actor tasks uses execution dependency API instead of dummy…

a5517d6

… argument

Fix dependency getters and some cleanup for fetching missing dependen…

a164f27

…cies

C++ convention

f378cc9

Make TaskExecutionSpec a C++ class

639b8c1

Convert local scheduler to use TaskExecutionSpec class

b0c63b4

Convert some pointers to references

787c711

Finish conversion to TaskExecutionSpec class

bf65f95

fix

7bd4d08

stephanie-wang force-pushed the actor-nondeterminism branch from 42bf669 to c512c83 Compare December 8, 2017 19:53

add more retries in global scheduler unit test

93d8a7d

fix linting and cast fbb.GetSize to size_t

c14391c

robertnishihara reviewed Dec 14, 2017

View reviewed changes

mehrdadn reviewed Dec 14, 2017

View reviewed changes

Style and doc

df92acb

stephanie-wang force-pushed the actor-nondeterminism branch from dfe49ad to df92acb Compare December 14, 2017 04:46

pcmoritz reviewed Dec 14, 2017

View reviewed changes

Fix linting and simplify from_flatbuf.

ae4da2a

robertnishihara approved these changes Dec 15, 2017

View reviewed changes

robertnishihara merged commit 12fdb3f into ray-project:master Dec 15, 2017

robertnishihara deleted the actor-nondeterminism branch December 15, 2017 04:47

atumanov reviewed Dec 20, 2017

View reviewed changes

Convert actor dummy objects to task execution edges. #1281

Convert actor dummy objects to task execution edges. #1281

Uh oh!

Conversation

stephanie-wang commented Dec 1, 2017

What do these changes do?

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

robertnishihara left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

AmplabJenkins commented Dec 1, 2017

Uh oh!

AmplabJenkins commented Dec 7, 2017

Uh oh!

AmplabJenkins commented Dec 7, 2017

Uh oh!

AmplabJenkins commented Dec 7, 2017

Uh oh!

AmplabJenkins commented Dec 7, 2017

Uh oh!

AmplabJenkins commented Dec 8, 2017

Uh oh!

AmplabJenkins commented Dec 8, 2017

Uh oh!

AmplabJenkins commented Dec 8, 2017

Uh oh!

AmplabJenkins commented Dec 13, 2017

Uh oh!

AmplabJenkins commented Dec 13, 2017

Uh oh!

AmplabJenkins commented Dec 13, 2017

Uh oh!

AmplabJenkins commented Dec 14, 2017

Uh oh!

AmplabJenkins commented Dec 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Dec 14, 2017

Uh oh!

AmplabJenkins commented Dec 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertnishihara commented Dec 15, 2017 •

edited

Loading