Skip to content

Correct the install command, static library name and typo in nccl.cmake. #5048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 25, 2017

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Oct 24, 2017

I try to build nccl manually, using make CUDA_HOME=$CUDA_ROOT, and I get the following outputs:

$ make CUDA_HOME=$CUDA_ROOT
Compiling src/libwrap.cu                      > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/libwrap.o
Compiling src/core.cu                         > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/core.o
Compiling src/all_gather.cu                   > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/all_gather.o
Compiling src/all_reduce.cu                   > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/all_reduce.o
Compiling src/broadcast.cu                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/broadcast.o
Compiling src/reduce.cu                       > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/reduce.o
Compiling src/reduce_scatter.cu               > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/reduce_scatter.o
Linking   libnccl.so.1.3.4                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4
Archiving libnccl_static.a                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a

The static library's name is libnccl_static.a.
I try make install to install the library, and get errors:

$ make install
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so' -> `/usr/local/lib/libnccl.so'
cp: cannot create symbolic link `/usr/local/lib/libnccl.so': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1' -> `/usr/local/lib/libnccl.so.1'
cp: cannot create symbolic link `/usr/local/lib/libnccl.so.1': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4' -> `/usr/local/lib/libnccl.so.1.3.4'
cp: cannot create regular file `/usr/local/lib/libnccl.so.1.3.4': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a' -> `/usr/local/lib/libnccl_static.a'
cp: cannot create regular file `/usr/local/lib/libnccl_static.a': Permission denied
make: *** [install] Error 1

We need to specify the install directory as make install PREFIX=install:

$ make install PREFIX=install
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so' -> `install/lib/libnccl.so'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1' -> `install/lib/libnccl.so.1'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4' -> `install/lib/libnccl.so.1.3.4'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a' -> `install/lib/libnccl_static.a'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/include/nccl.h' -> `install/include/nccl.h'

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -12,39 +11,39 @@ if(WITH_DSO)
set(NCCL_INSTALL_DIR "")
else()
# otherwise, we build nccl and link it.
set(NCCL_INSTALL_DIR ${THIRD_PARTY_PATH}/install/nccl)
# Note: cuda 8.0 is needed to make nccl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么一定需要cuda 8.0呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为nccl的Makefile里面设定了NVCC_GENCODE变量,-gencode=arch=compute_60,code=sm_60以上在cuda 8.0才支持。

https://github.com/NVIDIA/nccl/blob/master/Makefile#L20-L25

不过,NVCC_GENCODE这个变量应该也可以在make传进去,比如:

set(NVCC_GENCODE -gencode=arch=compute_35,code=sm_35
                 -gencode=arch=compute_50,code=sm_50 
                 -gencode=arch=compute_52,code=sm_52)
set(NCCL_BUILD_COMMAND "make NVCC_GENCODE=${NVCC_GENCODE}")

@Xreki Xreki merged commit 288ffdd into PaddlePaddle:develop Oct 25, 2017
@Xreki Xreki deleted the fix_nccl_typo branch November 14, 2018 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants