Feature/update sparse parameter #10351

chengduoZH · 2018-05-02T15:52:38Z

Enabling parallel_exe to support updating sparse parameters.
Different with #10096, which enable parallel_exe to support updating sparse parameters and assigns parameter gradients evenly to different cards for updates.

panyx0718 · 2018-05-03T12:42:52Z

paddle/fluid/framework/details/broadcast_op_handle.cc

+    }
+  } else {
+#ifdef PADDLE_WITH_CUDA
+    PADDLE_ENFORCE(platform::is_gpu_place(in_tensor.place()));


this line seems redundant?

panyx0718 · 2018-05-03T13:15:00Z

paddle/fluid/framework/details/broadcast_op_handle.cc

+      Variable *out_var = var_scopes.at(out_var_handle->scope_idx_)
+                              ->FindVar(out_var_handle->name_);
+
+      if (*out_var_handle != *in_var_handle) {


I recommend as a TODO, we use a named method (e.g. IsSameNameAndVersion()) for VarHandle comparison. Normally comparison operator overload should be used with care.

Thanks, done.

panyx0718 · 2018-05-03T13:26:36Z

paddle/fluid/framework/details/broadcast_op_handle.cc

+      }
+
+      int type = platform::ToNCCLDataType(in_tensor.type());
+      broadcast_calls.emplace_back([=] {


normally we don't use "=", which might accidently copy some big stuff. Explicitly name the vars you need in the lambda?

Thanks, done.

panyx0718 · 2018-05-03T13:29:15Z

paddle/fluid/framework/details/broadcast_op_handle.cc

+            &VariableVisitor::GetMutableTensor(out_var));
+      });
+    }
+  } else {


this else is very long...

the else code has been refactored.

panyx0718 · 2018-05-03T13:30:13Z

paddle/fluid/framework/details/broadcast_op_handle.cc

+      }
    });
+#else
+    PADDLE_THROW("CUDA is not support.");


CUDA is not enabled.

panyx0718 · 2018-05-03T14:05:29Z

paddle/fluid/framework/details/multi_devices_graph_builder.cc

      std::unordered_map<std::string, std::vector<std::unique_ptr<VarHandle>>>>(
      places_.size());

+  //  size_t cur_device_id = 0;


panyx0718 · 2018-05-03T14:19:04Z

paddle/fluid/framework/details/multi_devices_graph_builder.cc

    } else {
-      CreateComputationalOps(&result, *op, places_.size());
-      if (!is_forwarding) {
+      int op_dev_id = GetOpDeviceID(var_name_on_devices, *op);


This logic doesn't belong to this PR?

No, if this op's inputs don't include sparse gradient, GetOpDeviceID will return -1, this means this op should be executed on all devices.

panyx0718 · 2018-05-03T14:20:51Z

paddle/fluid/framework/details/multi_devices_graph_builder.cc

+  //  size_t cur_device_id = 0;
+  size_t update_sparse_gp_device_id = 0;
+  std::vector<std::unordered_set<std::string>> var_name_on_devices;
+  std::vector<std::unordered_set<std::string>> bcast_var_name_set;


sparse_var_xxx
bcast_sparse_var_xxx

panyx0718 · 2018-05-03T14:24:44Z

paddle/fluid/framework/details/multi_devices_graph_builder.h

+
 class MultiDevSSAGraphBuilder : public SSAGraphBuilder {
 public:
 #ifdef PADDLE_WITH_CUDA


let's discuss tomorrow. Does tensorflow use this many GOOGLE_CUDA? I feel PADDLE_WITH_CUDA is everywhere...

tensorflow uses GOOGLE_CUDA in many places too.

panyx0718 · 2018-05-03T14:25:29Z

paddle/fluid/framework/details/ssa_graph_builder.cc

  }
 }

+VarHandle *SSAGraphBuilder::GetLatestVarHandle(SSAGraph *graph,


Is this used some where?

No, this function doesn't belong to shi PR.

… feature/update_sparse_parameter

panyx0718 · 2018-05-04T08:25:46Z

paddle/fluid/framework/details/ssa_graph_builder.h

delete this line if it's not used?

panyx0718 · 2018-05-04T11:59:48Z

paddle/fluid/framework/details/broadcast_op_handle.cc

          call();
        }
      }
+      // TODO(zcd): Maybe the unequal operator is not appropriate here.


be more specific?

I have refined the logic of broadcast_op, the tensors' Place of input and output must be all on GPU or all on CPU.

panyx0718 · 2018-05-04T12:03:47Z

paddle/fluid/framework/details/var_handle.h

+  // Variable may have many different var_handles, the version_ of these
+  // var_handles
+  // is different. So I don't take care of version_ temporarily when overloading
+  // equal.


Please add a TODO and rename this == operator to something like IsNameAndScopeSame()?

panyx0718 · 2018-05-04T12:22:10Z

paddle/fluid/framework/details/multi_devices_graph_builder.cc

+      if (op_dev_id == -1) {  // var on all device
+        CreateComputationalOps(&result, *op, places_.size());
+      } else {
+        CreateComputationalOp(&result, *op, op_dev_id);


Is it possible to have var not being on all devices now?

Of course, just by removing this var from the var's scope.

chengduoZH added 2 commits May 2, 2018 20:29

update sparse parameter

5ff1ef3

update sparse gradient parameter with reduce and broadcast

c891189

chengduoZH requested review from panyx0718 and reyoung May 3, 2018 01:48

panyx0718 reviewed May 3, 2018

View reviewed changes

follow comments and clean code

7722baa

chengduoZH force-pushed the feature/update_sparse_parameter branch from d8ead0f to 7722baa Compare May 4, 2018 08:08

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f9c680c

… feature/update_sparse_parameter

chengduoZH force-pushed the feature/update_sparse_parameter branch from cff313a to f9c680c Compare May 4, 2018 08:28

panyx0718 previously approved these changes May 4, 2018

View reviewed changes

chengduoZH dismissed panyx0718’s stale review via d300d07 May 5, 2018 05:19

chengduoZH force-pushed the feature/update_sparse_parameter branch 2 times, most recently from 9da2bbf to 88be79d Compare May 5, 2018 05:27

fix ci

0441c2c

chengduoZH force-pushed the feature/update_sparse_parameter branch from 88be79d to 0441c2c Compare May 5, 2018 05:54

use Reduce and Broadcast

ff599b9

chengduoZH force-pushed the feature/update_sparse_parameter branch from 7b1c794 to e8ebb91 Compare May 5, 2018 07:44

follow comments

881e063

chengduoZH force-pushed the feature/update_sparse_parameter branch from e8ebb91 to 881e063 Compare May 5, 2018 07:51

panyx0718 approved these changes May 7, 2018

View reviewed changes

chengduoZH merged commit 99acf1d into PaddlePaddle:develop May 7, 2018

Feature/update sparse parameter #10351

Feature/update sparse parameter #10351

Uh oh!

Conversation

chengduoZH commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chengduoZH May 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengduoZH commented May 2, 2018 •

edited

Loading

chengduoZH May 5, 2018 •

edited

Loading