Dramatically improve performance of CPU kernels on less sophisticated compilers #78

zpzim · 2022-01-18T00:33:20Z

This PR does a couple of things:

Significantly increases the chance that a compiler will be able to autovectorize the inner loops of SCAMP
Adds compiler output for popular compilers (MSVC/clang/gcc) indicating which loops from the kernel were vectorized
Only handles NaNs in the CPU kernel when needed, as processing the nans can cause signifigant slowdown on less sophisticated compilers (MSVC)

Overall CPU performance in situations where these benefits were not already observed is 2-4x. Further optimization may be possible (e.g. explicitly enabling AVX code generation on MSVC on platforms that support it).

Platform specific optimizations are also possible on MSVC, as it does not support -march=native which we use everywhere else. Enabling AVX on MSVC

Fixed/Ignored some gcc compiler warnings.

…on occurs Only check for nan values in a kernel if necessary.

…pilation.

zpzim and others added 6 commits January 16, 2022 12:36

Force inline CPU kernel methods

a318daf

Add warnings when compiling cpu kernels and inlining fails

7903e27

Fixed/Ignored some gcc compiler warnings.

Committing clang-format changes

b57b96e

Optimize CPU kernels to improve the probability that loop vectorizati…

4884fbc

…on occurs Only check for nan values in a kernel if necessary.

Committing clang-format changes

fa2db8a

Clang will now output whether or not loops were vertorized during com…

0813df3

…pilation.

zpzim merged commit 0f8c54a into master Jan 19, 2022

zpzim mentioned this pull request Jan 19, 2022

Optimize CPU kernels for MSVC and Apple Clang #73

Closed

zpzim deleted the cpu-kernel-performance branch February 12, 2022 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatically improve performance of CPU kernels on less sophisticated compilers #78

Dramatically improve performance of CPU kernels on less sophisticated compilers #78

zpzim commented Jan 18, 2022

Dramatically improve performance of CPU kernels on less sophisticated compilers #78

Dramatically improve performance of CPU kernels on less sophisticated compilers #78

Conversation

zpzim commented Jan 18, 2022