Add multiplex operator #4064

kuke · 2017-09-13T09:22:55Z

Resolve #4010

QiJune · 2017-09-21T03:00:10Z

paddle/operators/multiplex_op.cc

+    auto num_ins = ins.size();
+    PADDLE_ENFORCE(num_ins > 2,
+                   "multiplex operator should have more than 2 inputs.");
+    PADDLE_ENFORCE_EQ(ins[0]->dims().size(), 1,


We also have to check the index in ins[0], index in ins[0] must less than ins[0]->dims()

Done. Add the index check in the forward compute function.

QiJune · 2017-09-21T03:04:51Z

paddle/operators/multiplex_op.cc

+                            "Input(Out@GRAD) shouldn't be null.");
+    auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
+    auto ins = ctx.MultiInput<Tensor>("X");
+    // don;t compute gradient for index


don;t --> don't

QiJune · 2017-09-21T03:14:14Z

paddle/operators/multiplex_op.cu

+    auto index = index_t_cpu.data<T>();
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


Please use cuda stream.

auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( ctx.device_context()) .stream(); platform::GPUPlace place = boost::Get<platform::GPUPlace>(ctx.GetPlace()); memory::Copy(place, out->data<T>() + i * cols, place, ins[k]->data<T>() + i * cols, cols * sizeof(T), stream);

QiJune · 2017-09-21T03:18:33Z

paddle/operators/multiplex_op.h

+    auto cols = ins[1]->dims()[1];
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


Maybe we can combine cpu code and cuda code in one file.

template <typename Place, typename T> class MultiplexKernel : public framework::OpKernel

We can use

t.device(context.GetEigenDevice<Place>()) = t.constant(static_cast<T>(0));

for set cpu/gpu to zero

And we can use

memory::Copy

for both cpu/gpu copy

It seems that merge CPU/GPU code together is not a good idea here. I make a mistake.
If CPU and GPU both use Eigen, we can reuse codes easily. But if not, it's actually better to split CPU and GPU implementation.

Done. split CPU/GPU code again.

pkuyym · 2017-09-21T02:35:02Z

paddle/operators/multiplex_op.cc

+
+class MultiplexOp : public framework::OperatorWithKernel {
+ public:
+  MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,


Why not use using framework::OperatorWithKernel:: OperatorWithKernel

kuke

Thanks for the valuable comments. Please review the changes

kuke · 2017-09-23T05:57:39Z

paddle/operators/multiplex_op.cc

+
+class MultiplexOp : public framework::OperatorWithKernel {
+ public:
+  MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,


kuke · 2017-09-23T05:57:49Z

paddle/operators/multiplex_op.cc

+                            "Input(Out@GRAD) shouldn't be null.");
+    auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
+    auto ins = ctx.MultiInput<Tensor>("X");
+    // don;t compute gradient for index


kuke · 2017-09-23T05:57:59Z

paddle/operators/multiplex_op.cu

+    auto index = index_t_cpu.data<T>();
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


kuke · 2017-09-23T05:58:08Z

paddle/operators/multiplex_op.h

+    auto cols = ins[1]->dims()[1];
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


QiJune

LGTM

Yibing Liu added 2 commits September 13, 2017 17:16

add multiplex operator

b3f44ad

merge conflicts

4a71d95

kuke requested review from dzhwinter, qingqing01 and reyoung September 13, 2017 09:23

kuke added the OpPorting label Sep 13, 2017

qingqing01 requested review from QiJune and pkuyym and removed request for dzhwinter, qingqing01 and reyoung September 18, 2017 08:31

Yibing Liu added 2 commits September 20, 2017 17:39

merge multiplex_op with the latest upstream

18dc201

adapt multiplex_op to the dev of framework

9da5192

QiJune reviewed Sep 21, 2017

View reviewed changes

pkuyym reviewed Sep 21, 2017

View reviewed changes

Yibing Liu added 2 commits September 22, 2017 11:56

Merge branch 'develop' of upstream into multiplex_op_dev

85a5d38

combine gpu&cpu code in multiplex_op

7620efd

kuke commented Sep 23, 2017

View reviewed changes

revert code layout in multiplex_op

fb52bc6

QiJune approved these changes Sep 25, 2017

View reviewed changes

kuke merged commit 47fbc96 into PaddlePaddle:develop Sep 25, 2017

Add multiplex operator #4064

Add multiplex operator #4064

Uh oh!

Conversation

kuke commented Sep 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kuke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants