Skip to content

Conversation

@Xreki
Copy link
Contributor

@Xreki Xreki commented Nov 28, 2017

Fix #5997

#include <mutex>
#include "hl_gpu.h"
#include "paddle/utils/Logging.h"
#ifdef PADDLE_WITH_CUDA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WITH_GPU=OFF下编译的时候会include hl_cuda_stub.h;下面这些代码中的宏应该是不需要加也能正确编译的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加这个是因为,EigenGemm.cpp中引入了#include "paddle/math/MemoryHandle.h",从而会间接地引入#include "hl_base.h",这个头文件中定义了using real float,会导致Eigen的编译问题。

sizeC[1] = N;
CHECK_EQ(N, ldc);
T* gemmC = C;
if (N != ldc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样的fix方式,会给以后带来,性能风险(而且,不熟悉这段代码的人也不容易知道这里有性能问题)。可以,看一下Eigen有没有别的表达方式,可以直接支持stride参数的。

Copy link
Contributor Author

@Xreki Xreki Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我查过了,Eigen的stride方式不适合这种情况。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Eigen::DefaultDevice device;
if (alpha == T(1) && beta == T(0)) {
c.device(device) = a.contract(b, dims);
c.slice(offsetC, extentC).device(device) = a.contract(b, dims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑两种情况:

  1. ldc_1 > N and use the operation c.slice(offsetC, extentC).device(device) = a.contract(b, dims);
  2. ldc_2 == N and use the operation c.device(device) = a.contract(b, dims);
    1和2中的MNK是一样的,但是1中的ldc > N,2中的ldc == N,这两种情况下分别采用这两个不同的计算方式表现出来的性能(gflops)是一样的吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 这样代码显得有点长(重复)了。后面来测下耗时。

@Xreki Xreki merged commit 42708de into PaddlePaddle:develop Dec 1, 2017
@Xreki Xreki deleted the fix_stride_eigen branch November 14, 2018 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants