Skip to content

Conversation

@icfaust
Copy link
Contributor

@icfaust icfaust commented Feb 14, 2024

Macro definition for X86_SIMD_SORT_UNROLL_LOOP will break x86-simd-sort when compiling with icc as they define __GNUC__. This introduces a check for an Intel compiler by looking if __INTEL_COMPILER has been set, and sets the appropriate pragma (not done for __INTEL_LLVM_COMPILER due to compiling performance issues).

There is also an issue with avx2 where icc will complain that there isn't a return type for avx2_vector for 32 and 64 bit. The compile time evaluation and static assert are placed ahead of a generalized return, which shouldn't change the logic or runtime. I definitely recommend the maintainers to evaluate those changes if they make sense (should the static assert require a previous constexpr check?)

Intel LLVM C++ compilers do not define __builtin_cpu_supports("avx512fp16") which requires certain code not to be run when using icpx on sapphirerapids ISA-capable machines.

Testing uses the intel apt repository for getting the icpx compiler. The GPG key can be added, but that will add something that must be updated in the future/ requires future maintenance. Suggestions are welcome.

Meson from apt is out of date enough that it doesn't know of the icpx compiler. The pip version is up to date and is instead used in the new additional CI job.

Tasks

  • Implement icc-specific changes
  • Implement icpx-specific changes
  • Add tests for Intel LLVM C++ compiler
  • Pass tests

@icfaust icfaust changed the title [enh] Add icc compiler support [enh] Add intel compiler support Feb 14, 2024
@icfaust
Copy link
Contributor Author

icfaust commented Feb 14, 2024

I will reach out to compiler development about __builtin_cpu_supports("avx512fp16") issues with icpx

@icfaust
Copy link
Contributor Author

icfaust commented Feb 15, 2024

So I had erroneously set the intel compiler check within an if statement for __GNUC__ which wouldn't actually be reached for icc and icpx. When made available to those compilers, the compile times for icpx skyrocketed both locally and in CI (to timeout). It will compile but takes significant time. I have removed loop unrolling for icpx, and have returned the new CI to the previous state. This will allow for CI to properly run, and will be investigated if it is a compiler issue on intel's side.

@icfaust
Copy link
Contributor Author

icfaust commented Feb 15, 2024

Local testing shows that icpx/icc do not support __builtin_cpu_supports("avx512fp16"), even if icpx supports __FLT16_MAX__/ _Float16 types. ICC just doesn't support __FLT16_MAX__ at all, so isn't a problem. I will recondition the requisite check. This is a compiler problem (not even available in most recent builds).

Copy link
Member

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for adding CI as well.

@r-devulap r-devulap merged commit 87955f6 into numpy:main Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants