Skip to content

Conversation

@kuke
Copy link
Contributor

@kuke kuke commented Sep 13, 2017

Resolve #4010

@kuke kuke added the OpPorting label Sep 13, 2017
@qingqing01 qingqing01 requested review from QiJune and pkuyym and removed request for dzhwinter, qingqing01 and reyoung September 18, 2017 08:31
auto num_ins = ins.size();
PADDLE_ENFORCE(num_ins > 2,
"multiplex operator should have more than 2 inputs.");
PADDLE_ENFORCE_EQ(ins[0]->dims().size(), 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have to check the index in ins[0], index in ins[0] must less than ins[0]->dims()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add the index check in the forward compute function.

"Input(Out@GRAD) shouldn't be null.");
auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
auto ins = ctx.MultiInput<Tensor>("X");
// don;t compute gradient for index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don;t --> don't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto index = index_t_cpu.data<T>();
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cuda stream.

auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
                       ctx.device_context())
                       .stream();
platform::GPUPlace place = boost::Get<platform::GPUPlace>(ctx.GetPlace());
memory::Copy(place, out->data<T>() + i * cols, place, ins[k]->data<T>() + i * cols, cols * sizeof(T), stream);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto cols = ins[1]->dims()[1];
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can combine cpu code and cuda code in one file.

template <typename Place, typename T>
class MultiplexKernel : public framework::OpKernel 

We can use

t.device(context.GetEigenDevice<Place>()) = t.constant(static_cast<T>(0));

for set cpu/gpu to zero

And we can use

memory::Copy

for both cpu/gpu copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that merge CPU/GPU code together is not a good idea here. I make a mistake.
If CPU and GPU both use Eigen, we can reuse codes easily. But if not, it's actually better to split CPU and GPU implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. split CPU/GPU code again.


class MultiplexOp : public framework::OperatorWithKernel {
public:
MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use using framework::OperatorWithKernel:: OperatorWithKernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the valuable comments. Please review the changes


class MultiplexOp : public framework::OperatorWithKernel {
public:
MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

"Input(Out@GRAD) shouldn't be null.");
auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
auto ins = ctx.MultiInput<Tensor>("X");
// don;t compute gradient for index
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto index = index_t_cpu.data<T>();
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto cols = ins[1]->dims()[1];
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kuke kuke merged commit 47fbc96 into PaddlePaddle:develop Sep 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants