IntMatrix.h is simple std::vector<std::vector> wrapper to represent matrixes. IntMatrix.h uses SSE extension to speed up addition and substraction operations.
- brew for x86 is required
- cmake installed under x86 brew is required
To build under Rosetta 2 with SSE extension enabled. Script adds required defintions automatically.
./scripts/build_x86.sh
To build without SSE support use:
./scripts/build_arm64.sh
- cmake is required
cmake . -Bbuild/x86 -DBUILD_X86_WITH_SSE=1
cmake --build build/x86
See this guide
__m128i _mm_lddqu_si128 (__m128i const* mem_addr)
__m128i _mm_sub_epi32 (__m128i a, __m128i b)
__m128i _mm_add_epi32 (__m128i a, __m128i b)
void _mm_storeu_si32 (void* mem_addr, __m128i a)
__m128i _mm_bsrli_si128 (__m128i a, int imm8)
You could generate your own tests using script generateTestcases.py. To see usage run without parameters. Paste generated tests into tests/test.cpp and build.
The following script build two versions of static library IntMatrix.h with support of custom SSE and without it. As second step, it builds three executables:
- With -O3 and with custom SSE support
- With -O3 and without custom SSE support
- Without -O3 and without custom SSE support
As third step all executables run with time recording functions.
./scripts/runBenchmark.sh
- Custom SSE twices perfomance comparing to code without compiler optimizations
- -O3 flag still gives more perfomance because compiler seems to be using AVX