Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export -march=native for Clang and prevent it from being included during cross compilation. #2416

Merged
merged 1 commit into from
Sep 10, 2018

Conversation

SergioRAgostinho
Copy link
Member

Closes: #2256

Tried it on OS X and its adding the flag and the flag is being passed to the downstream target (I applied it after #2100). I also ran #2013 and it seems fine.

Rebase Merge after #2100 i.e., just consider the last commit.

@svenevs
Copy link
Contributor

svenevs commented Sep 9, 2018

Hey I'm sorry I'm confused. Am I supposed to be testing this as-is or waiting for a rebase now that #2100 is merged? I was waiting for the rebase but now I'm not sure...it's 3:15am for me right now, got plans in the morning but i should be able to check this out tomorrow afternoon (~12-16 hrs from now).

@SergioRAgostinho
Copy link
Member Author

SergioRAgostinho commented Sep 9, 2018

Sorry for not being clear. The rebase merge comment was more for Sergey in case he wanted to merge this following #2100, since it includes commits from that PR. Now that #2100 is merged, I'll do the rebase manually.

Regardless of that and for future reference, you can always test it as is. The rebasing operations are usually just to have a clean history before the merge commits.

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

Looks good to me!!!

Building PCL Itself

(at very endish there's a -march=native)

[1/407] /opt/spack/opt/spack/linux-fedora27-x86_64/gcc-7.3.1/llvm-6.0.1-ls7mvjhlg4qn37rp7p5v55ybnttpjtls/bin/clang++  -DPCLAPI_EXPORTS -DQT_CONCURRENT_LIB -DQT_CORE_LIB -DQT_GUI_LIB -DQT_OPENGL_LIB -DQT_WIDGETS_LIB -Dqh_QHpointer -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/flann-1.9.1-xbbwnltjq34m4oyd75g7zy7i3reams2w/include -I../recognition/include/pcl/recognition/3rdparty -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/libpng-1.6.34-ncs3zqm3niyngbtnrkzsh4rgkjuxbehy/include -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/qhull-2015.2-mf4i3x32ay5qm4ciulpkslycqay7sjfo/include -I/usr/include/qt5 -I/usr/include/qt5/QtCore -I/usr/lib64/qt5/./mkspecs/linux-g++ -I/usr/include/qt5/QtGui -I/usr/include/qt5/QtWidgets -I/usr/include/qt5/QtConcurrent -I/usr/include/qt5/QtOpenGL -Iinclude -I../common/include -isystem /opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/eigen-3.3.4-bfkdaij4s2kp4xnhyet2umn5ontrxjzj/include/eigen3 -isystem /opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/boost-1.68.0-4plhxwpwc7rtsia6u3jc6cs6y4cbcb4t/include -ftemplate-depth=1024 -Qunused-arguments -Wno-invalid-offsetof -march=native -msse4.2 -mfpmath=sse -fopenmp=libomp      -O3 -DNDEBUG -fPIC   -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -MD -MT common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o -MF common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o.d -o common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o -c ../common/src/correspondence.cpp

AKA -march=native with clang@6.0.1 got in there as desired 💥

FWIW, it shouldn't matter too much, but this was with the following configuration:

$ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/pcl/clang/6.0.1 -DWITH_CUDA=OFF -DWITH_VTK=OFF

Compilers verified:

  • gcc@7.3.1
  • clang@5.0.0
  • clang@6.0.1

Using PCL From CMake

I didn't actually create / test a project yet, but it should work out just fine.

sven:/opt/pcl> grep march gcc/7.3.1/share/pcl-1.8/PCLConfig.cmake 
gcc/7.3.1/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)
sven:/opt/pcl> grep march clang/6.0.1/share/pcl-1.8/PCLConfig.cmake 
clang/6.0.1/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)
sven:/opt/pcl> grep march clang/5.0.0/share/pcl-1.8/PCLConfig.cmake 
clang/5.0.0/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)

AKA this should definitely be working as expected.

I am fixing compilation issues with intel right now, fixed the deprecated macros, working on mm_malloc right now, then the only other thing I'm gonna add is -xHost when Intel (or the Windows version if windows and intel). I expect to have this PR finished before I go to bed ;)

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

Honestly, I don't think we need to check for GCC version being greater than 4.2. It is more than a decade old! And the oldest active Ubuntu LTS (which is 12.04) has 4.8. Thus I would propose to drop versions testing and reduce the script to:

if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_GNUCXX OR CMAKE_COMPILER_IS_CLANG)
    list(APPEND SSE_FLAGS "-march=native")
    message(STATUS "Using CPU native flags for SSE optimization: ${SSE_FLAGS}")
endif()

@SergioRAgostinho
Copy link
Member Author

👍 I'll modify it tonight.

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

In that event, I will open an alternate PR. The Intel stuff is ummm...despicable. They have some really inconvenient / arbitrary compiler flag rules.

In addition to fixing intel compilation, I also added AVX and AVX2 checks. Posting so you know it's inbound, I'll remove the GCC 4.2 check in that PR. I'm just waiting to check on all the compilers again to make sure it's legit (intel's compiler is actually really slow...).

I think you all will be pleased with the result though, it looks like a big diff but really not IMO. I should have participated in #2100 but I was trying to not get sucked in lol.

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

In addition to fixing intel compilation, I also added AVX and AVX2 checks.

I suppose you are talking about those check_cxx_source_runs and setting HAVE_*_EXTENSIONS variables? Actually, do you have any idea why these are needed at all?

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

I suppose you are talking about those check_cxx_source_runs and setting HAVE_*_EXTENSIONS variables?

Yup.

Actually, do you have any idea why these are needed at all?

Only a very fuzzy picture. Here's how I understand the gcc docs, inlining some for convenience with some emphasis

-march=cpu-type

Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.

In contrast, at the bottom the giant listing of all the possible -m's (I refuse to do AVX512 btw, it's a nightmare).

These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, ........

So while you may think that -march=native, it's not really a guarantee, so checking that the capabilities exist is still a good idea in my opinion. This blog post talks about some of the oddities.

To be clear, I'm not an expert, and definitely not 100%. I think adding -march=native -mavx2 won't hurt though, it covers when -march=native may not do what you want (???)

On the other hand, now that I take a second look, these also need to be guarded by if( NOT CMAKE_CROSSCOMPILING) don't they? There's no reason the target machine would have those.

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

Sorry for double response. I forgot the most important. There's no -march=native for visual studio (though they do support /arch:AVX now), and the Intel one is actually only for Intel CPUs (can't use the -xHost flag for an AMD CPU).

After retesting the Linux ones I will test visual studio and this specific module, but I won't actually be testing building PCL itself on Windows.

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

So while you may think that -march=native, it's not really a guarantee, so checking that the capabilities exist is still a good idea in my opinion. This blog post talks about some of the oddities.

Thanks for the post, interesting reading. Did you also check comments? This one provides a very reasonable explanation why the oddities that the author talks about actually make sense. That said, I don't think this post is relevant to our discussion since it's about -mtune :)

I think adding -march=native -mavx2 won't hurt though, it covers when -march=native may not do what you want (???)

GCC docs claim that using -march=native enables all instruction subsets supported by the local machine. It indeed does not hurt to add -mavx2 explicitly, but why would we do that? Are you aware of any cases when -march=native does not do what you want? (In terms of enabling instruction sets.)

On the other hand, now that I take a second look, these also need to be guarded by if( NOT CMAKE_CROSSCOMPILING) don't they? There's no reason the target machine would have those.

Certainly.

There's no -march=native for visual studio

I'm no Visual Studio user, but I think I've heard that -march=nativeish is the default behavior.

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

Ok I proposed #2426 in a way that the changes to the file being discussed here can be blown away (kill the last commit) in favor of something less complex.

Yeah I choose to pretend -mtune does not exist xD

I will test and see what's going on with Windows. Apparently it used to be /arch:SSE2 for a long time as the default, but then I've found other things mentioning that /arch:SSE2 is invalid on a 64bit Visual Studio build (as indicated by the linked docs in the PR, 64bit only allows /arch:AVX and /arch:AVX2).

Maybe the best approach here is to have @SergioRAgostinho finish of this PR with the elimination of ancient checks #2416 (comment)

I could then open a different PR with just the first two commits there so that Intel officially works in the next release, and more time can be spent deciding wtf is actually going on with these darn flags. I'm stepping away for this for at least 24 hours because I'm starting to get too turned around. I will test Windows tomorrow. Sorry for stream-of-consciousness-posting here.......I'm completely frazzled

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

Thanks for all the effort. I feel like merging this and then Intel parts of your PR would be the best plan for this release.

After that we can think about all the questions that have popped up and decide on the plan. Your PCL_SSE_COMPILER_ARCH_FLAG function will be a great starting point, maybe we won't need to do much more. (By the way, we may check how OpenCV deals with architecture issues, I expect them to have an advanced state-of-the-art CMake script.)

@svenevs
Copy link
Contributor

svenevs commented Sep 10, 2018

Sounds good. I'll open a new PR with those two commits after this merges 👍

I'll take a closer look at theirs, it's scary. But I really need to solve this once and for all for my own library so PCL will benefit too (I stole a lot of code from here so I'm paying down my debts lol).

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

I'll take a closer look at theirs, it's scary

Some of their stuff may be too advanced (for example, we don't have runtime dispatching). But we can certainly get inspired and borrow some functions/snippets.

I'm paying down my debts lol

Very welcome :)

@SergioRAgostinho
Copy link
Member Author

In addition to fixing intel compilation, I also added AVX and AVX2 checks. Posting so you know it's inbound, I'll remove the GCC 4.2 check in that PR. I'm just waiting to check on all the compilers again to make sure it's legit (intel's compiler is actually really slow...).

It feels like this one was superseeded by #2426 correct?

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

No:

Thanks for all the effort. I feel like merging this and then Intel parts of your PR would be the best plan for this release.

@SergioRAgostinho
Copy link
Member Author

Ok. Is there anything preventing this one from being merged?

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

Yes:

+1 I'll modify it tonight.

@SergioRAgostinho
Copy link
Member Author

That should be it I believe. Travis seems to have died.

@taketwo
Copy link
Member

taketwo commented Sep 10, 2018

Travis is alright, but GitHub does not show it for some reason.

@taketwo taketwo merged commit 88a7adc into PointCloudLibrary:master Sep 10, 2018
@svenevs
Copy link
Contributor

svenevs commented Sep 11, 2018

@SergioRAgostinho sorry for adding all the noise / making it confusing...

@taketwo I cherry-picked the two fixes needed for Intel unrelated to vectorization stuff in #2432 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants