Repair nccl op test #8575

QiJune · 2018-02-26T07:55:29Z

chengduoZH · 2018-02-26T10:56:48Z

paddle/fluid/operators/nccl_op_test.cu.cc

@@ -121,27 +134,11 @@ class NCCLTester : public ::testing::Test {
  std::vector<p::DeviceContext *> dev_ctxs;
  f::Scope g_scope;
  std::mutex mu;
+  std::vector<int> gpu_list;


Data members of classes should be with a trailing underscore.
refer : Variable Names

tonyyang-svail · 2018-03-08T00:09:40Z

paddle/fluid/operators/nccl_op_test.cu.cc

    auto op = f::OpRegistry::CreateOp(*op1);
    VLOG(1) << "invoke NCCLInitOp.";
    op->Run(g_scope, cpu_place);
    VLOG(1) << "NCCLInitOp finished.";
  }

+  int GetGPUData(int gpu_id) { return gpu_id + 42; }


This function is necessary, because

Simply set GPU data = gpu_id won't expose the incorrect linking error. More specifically, Paddle might be compiled with NCCL1.3 header while dynamically linked with NCCL2.so.

chengduoZH · 2018-03-08T16:41:31Z

paddle/fluid/operators/nccl_op_test.cu.cc

@@ -97,7 +108,7 @@ class NCCLTester : public ::testing::Test {
      send_tensor->Resize(kDims);
      send_tensor->mutable_data<T>(kDims, place);


Maybe line 108 is unnecessary.

chengduoZH · 2018-03-08T16:54:38Z

paddle/fluid/operators/nccl_op_test.cu.cc

@@ -97,7 +108,7 @@ class NCCLTester : public ::testing::Test {
      send_tensor->Resize(kDims);
      send_tensor->mutable_data<T>(kDims, place);

-      std::vector<T> send_vector(f::product(kDims), gpu_id);
+      std::vector<T> send_vector(f::product(kDims), GetGPUData(gpu_id));
      paddle::framework::TensorFromVector<T>(send_vector, *ctx, send_tensor);
      ctx->Wait();


Is it necessary to synchronize here? I think the copying will synchronize the GPU and CPU in line 179.

The copying is cudaMemcpyAsync. Looks like we need to add a wait to line 179...

Paddle/paddle/fluid/memory/memcpy.cc

Lines 30 to 36 in 0d49b92

template <>

void Copy<platform::CPUPlace, platform::CUDAPlace>(

platform::CPUPlace dst_place, void* dst, platform::CUDAPlace src_place,

const void* src, size_t num, cudaStream_t stream) {

platform::SetDeviceId(src_place.device);

platform::GpuMemcpyAsync(dst, src, num, cudaMemcpyDeviceToHost, stream);

}

I don't think so because the memory is pageable in CPU side, the copying doesn't return immediately until the copy has completed.
The current CUDA Runtime Documentation states:
Asynchronous(Memcpy):
- For transfers from device memory to pageable host memory, the function will return only once the copy has completed.

A reference https://devtalk.nvidia.com/default/topic/899020/does-cudamemcpyasync-require-pinned-memory-/

chengduoZH

LGTM!

QiJune added 3 commits February 26, 2018 15:33

fix nccl op unit test

a4b71e9

fix build error

614914e

format code

0e43e2c

QiJune requested review from dzhwinter and tonyyang-svail February 26, 2018 07:57

QiJune added 2 commits February 26, 2018 16:46

refine nccl related unit test

eb6ff79

fix build error

70b71c8

chengduoZH reviewed Feb 26, 2018

View reviewed changes

Yang Yang added 3 commits March 6, 2018 23:34

Merge remote-tracking branch 'upstream/develop' into repair_nccl_op_test

93f10fc

add setGPUData

0a0c7ed

clean up

fad09a9

tonyyang-svail reviewed Mar 8, 2018

View reviewed changes

Yang Yang added 2 commits March 8, 2018 00:15

follow comments

eeaf562

rm test_nccl.cu

e32f306

chengduoZH reviewed Mar 8, 2018

View reviewed changes

Yang Yang added 2 commits March 8, 2018 20:20

follow comment

000c756

rm wait

9a42d3e

chengduoZH approved these changes Mar 13, 2018

View reviewed changes

QiJune merged commit 7287630 into PaddlePaddle:develop Mar 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repair nccl op test #8575

Repair nccl op test #8575

QiJune commented Feb 26, 2018 •

edited

Loading

chengduoZH Feb 26, 2018

tonyyang-svail Mar 8, 2018 •

edited

Loading

chengduoZH Mar 8, 2018

chengduoZH Mar 8, 2018

tonyyang-svail Mar 8, 2018

chengduoZH Mar 9, 2018

QiJune Mar 9, 2018

tonyyang-svail Mar 12, 2018

chengduoZH left a comment

		@@ -97,7 +108,7 @@ class NCCLTester : public ::testing::Test {
		send_tensor->Resize(kDims);
		send_tensor->mutable_data<T>(kDims, place);

	template <>
	void Copy<platform::CPUPlace, platform::CUDAPlace>(
	platform::CPUPlace dst_place, void* dst, platform::CUDAPlace src_place,
	const void* src, size_t num, cudaStream_t stream) {
	platform::SetDeviceId(src_place.device);
	platform::GpuMemcpyAsync(dst, src, num, cudaMemcpyDeviceToHost, stream);
	}

Repair nccl op test #8575

Repair nccl op test #8575

Conversation

QiJune commented Feb 26, 2018 • edited Loading

chengduoZH Feb 26, 2018

Choose a reason for hiding this comment

tonyyang-svail Mar 8, 2018 • edited Loading

Choose a reason for hiding this comment

chengduoZH Mar 8, 2018

Choose a reason for hiding this comment

chengduoZH Mar 8, 2018

Choose a reason for hiding this comment

tonyyang-svail Mar 8, 2018

Choose a reason for hiding this comment

chengduoZH Mar 9, 2018

Choose a reason for hiding this comment

QiJune Mar 9, 2018

Choose a reason for hiding this comment

tonyyang-svail Mar 12, 2018

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

QiJune commented Feb 26, 2018 •

edited

Loading

tonyyang-svail Mar 8, 2018 •

edited

Loading