Skip to content

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Jul 22, 2021

I did some digging and it seems that mkl==2021.3.0 and in particular their newly added support for CMake is the problem. If you look at the executed clang command

/Applications/Xcode-12.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++  -isysroot /Applications/Xcode-12.0.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -dynamiclib -Wl,-headerpad_max_install_names -o libtorchvision.dylib -install_name @rpath/libtorchvision.dylib CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/common_jpeg.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/decode_image.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/decode_jpeg.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/decode_png.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/encode_jpeg.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/encode_png.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cpu/read_write_file.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/cuda/decode_jpeg_cuda.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/io/image/image.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/alexnet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/densenet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/googlenet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/inception.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/mnasnet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/mobilenet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/resnet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/shufflenetv2.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/squeezenet.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/models/vgg.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/autograd/deform_conv2d_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/autograd/ps_roi_align_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/autograd/ps_roi_pool_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/autograd/roi_align_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/deform_conv2d_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/nms_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/ps_roi_align_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/ps_roi_pool_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/roi_align_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/cpu/roi_pool_kernel.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/deform_conv2d.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/interpolate_aa.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/nms.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/ps_roi_align.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/ps_roi_pool.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/roi_align.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/ops/roi_pool.cpp.o CMakeFiles/torchvision.dir/torchvision/csrc/vision.cpp.o  -Wl,-rpath,/Users/distiller/miniconda3/lib/python3.9/site-packages/torch/lib -Wl,-rpath,/Users/distiller/miniconda3/lib /Users/distiller/miniconda3/lib/python3.9/site-packages/torch/lib/libc10.dylib /Users/distiller/miniconda3/lib/libpng.dylib /Users/distiller/miniconda3/lib/libjpeg.dylib /Users/distiller/miniconda3/lib/libpython3.9.dylib /Users/distiller/miniconda3/lib/python3.9/site-packages/torch/lib/libtorch.dylib /Users/distiller/miniconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib /Users/distiller/miniconda3/lib/python3.9/site-packages/torch/lib/libc10.dylib -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread

it seems that mkl appends the libraries with the -l flag whereas everything else is listed with the full path. The -l flag looks a lot like gcc and I'm guessing this is why we are only seeing this on MacOS and not Linux or Windows.

Since there is very little we can do, this PR simply pins mkl to the previous version 2021.2.0 to avoid this.

@pmeier pmeier requested a review from NicolasHug July 22, 2021 17:59
@pmeier pmeier marked this pull request as ready for review July 23, 2021 07:31
@pmeier pmeier force-pushed the fix-macos-cmake branch from 0f5655a to 1c2dda1 Compare July 23, 2021 08:22
@NicolasHug
Copy link
Member

Thanks a lot for looking into this @pmeier

Just a few questions:

I'm a bit confused about cmake not properly generating the right linker flags for clang... isn't this literally its job to abstract those things away?

Also, where did you find the clang command that was executed? I couldn't find it in the CI logs

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 23, 2021

I'm a bit confused about cmake not properly generating the right linker flags for clang... isn't this literally its job to abstract those things away?

Yeah it should do that. My guess is that the CMake files for mkl are not properly set up so they only work with gcc. If you want I can investigate what is happening on Windows / Linux.

Also, where did you find the clang command that was executed? I couldn't find it in the CI logs

It is not there. I've SSH'ed into the CI machine and looked at the CMake cache.

@NicolasHug
Copy link
Member

I did some digging and it seems that mkl==2021.3.0 and in particular their newly added support for CMake is the problem

It does seem related to that indeed, I tried to remove MKL's cmake files (/Users/distiller/miniconda3/lib/cmake/mkl) and the build went fine.

@malfet @seemethere would you mind taking a look? The current failure can be found here: https://app.circleci.com/pipelines/github/pytorch/vision/9553/workflows/2bb2eaaa-dad8-4b6b-a0fd-fba07e732a70/jobs/706701

Is there a way we can tell the torchvision build to not look at MKL's cmake files? Or should we report to them that their cmake is messing up something?

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa fmassa merged commit a839796 into pytorch:master Jul 30, 2021
@pmeier pmeier deleted the fix-macos-cmake branch July 30, 2021 12:06
facebook-github-bot pushed a commit that referenced this pull request Aug 3, 2021
Summary:
* fix MacOS cmake workflow

* try only mkl

* only pin mkl on MacOs

* fix

Reviewed By: NicolasHug

Differential Revision: D30069954

fbshipit-source-id: 2ffb364062ddedf8026d596de75ba6ec54d57f27

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
malfet pushed a commit that referenced this pull request Sep 14, 2021
* fix MacOS cmake workflow

* try only mkl

* only pin mkl on MacOs

* fix

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
malfet added a commit that referenced this pull request Sep 14, 2021
* fix MacOS cmake workflow

* try only mkl

* only pin mkl on MacOs

* fix

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants