Added AVX512 support for space_l2 and space_ip. #339

slice4e · 2021-09-09T17:13:29Z

Add AVX512 support to L2 and IP algorithms.

Testing with Ann-Benchmarks shows measurable improvement on c5.4xlarge and m6i.4xlarge instances.

c5.4xlarge results
fashion-mnist-784-euclidean

mnist-784-euclidean

nytimes-256-angular

sift-128-euclidean

m6i.4xlarge results:
fashion-mnist-784-euclidean

mnist-784-euclidean

nytimes-256-angular

sift-128-euclidean

yurymalkov

Thanks you so much for the PR!
It seems like there is #endif missing. Can you please check?
Also, I wonder if you've tested it with sustained multithreaded load? I heard AVX512 can cause cloud instances to overheat and throttle, so they can even perform worse.

hnswlib/hnswlib.h

hnswlib/space_l2.h

yurymalkov · 2021-09-16T05:02:19Z

I've tried to run the code and noticed that __AVX512__ is not defined in gcc with -march=native (there are __AVX512BW__, __AVX512F__, __AVX512VNNI__ and many others, probably corresponding to different subsets).
I wonder, how do you test the code? Can you please add instructions on how to compile with AVX512 support?

…le time. Check for the __AVX512F__ define. All the AVX512 instructions that we are using are part of the Foundation set.

…, using PORTABLE_ALIGN32. However, we need to align it to 64 bytes becase of the use of unaligned store instruction _mm512_store_ps .

slice4e · 2021-09-16T16:10:34Z

I've tried to run the code and noticed that __AVX512__ is not defined in gcc with -march=native (there are __AVX512BW__, __AVX512F__, __AVX512VNNI__ and many others, probably corresponding to different subsets).
I wonder, how do you test the code? Can you please add instructions on how to compile with AVX512 support?

Hi - Sorry, this was a mistake on my part. For some reason, I did not realize that we are checking for the presence of AVX/AVX2/etc at compile time, and I assumed that the application using hnswlib is setting the defines. In highsight, that does not make sense...

I fixed it now. We are now checking for AVX512F (which is the foundational set of AVX512 instructions.). All the instructions that we are using are part of AVX512F
For my performance testing, I had simply hardcoded the #define USE_AVX512 ... in the AVX512 tests.

In addition, I fixed a bug which was causing a crash on some system due to not aligning the memory correctly for the store instruction (_mm512_store_ps). This instruction requires memory to be aligned to 64 bytes.

yurymalkov · 2021-09-29T06:12:29Z

hnswlib/space_l2.h

+
+    // Favor using AVX512 if available.
+    static float
+    L2SqrSIMD16Ext(float *pVect1, float *pVect2, size_t qty) {


Sorry for a long response. It seems float type here obstructs the compilation.
It should be void *

Sorry for a long response. It seems float type here obstructs the compilation. It should be void *

Thanks for catching this. I have fixed it.

I've tried and there other errors with constant void * to void * conversion (I guess it should be made const void * down the stack).
Can you please fix it and test it with pip install . and python -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py" (to run the tests) on a linux system.

Thanks for the help! I fixed those errors. I was able to compile it and test it without errors following your instructions.

yurymalkov · 2021-10-03T06:21:06Z

Thanks for the PR!!

Added AVX512 support for space_l2 and space_ip.

cb399cf

yurymalkov requested changes Sep 10, 2021

View reviewed changes

hnswlib/hnswlib.h Outdated Show resolved Hide resolved

hnswlib/space_l2.h Show resolved Hide resolved

fixed missing #endif.

677700f

slice4e added 2 commits September 16, 2021 15:45

Corrently check for the presense of AVX512F (Foundation set) at compi…

e7935b7

…le time. Check for the __AVX512F__ define. All the AVX512 instructions that we are using are part of the Foundation set.

Fixed a bug where we are aligning the TmpRes[16] variable to 32 bytes…

d7bec60

…, using PORTABLE_ALIGN32. However, we need to align it to 64 bytes becase of the use of unaligned store instruction _mm512_store_ps .

yurymalkov reviewed Sep 29, 2021

View reviewed changes

slice4e added 2 commits September 29, 2021 14:02

changed float * to void * in L2SqrSIMD16Ext

05de244

fixed errors with const void* conversion

290f3e2

yurymalkov merged commit ff10e88 into nmslib:develop Oct 3, 2021

slice4e deleted the develop branch October 4, 2021 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added AVX512 support for space_l2 and space_ip. #339

Added AVX512 support for space_l2 and space_ip. #339

Uh oh!

slice4e commented Sep 9, 2021

Uh oh!

yurymalkov left a comment

Uh oh!

Uh oh!

Uh oh!

yurymalkov commented Sep 16, 2021

Uh oh!

slice4e commented Sep 16, 2021

Uh oh!

yurymalkov Sep 29, 2021

Uh oh!

slice4e Sep 29, 2021

Uh oh!

yurymalkov Sep 30, 2021

Uh oh!

slice4e Oct 1, 2021

Uh oh!

yurymalkov commented Oct 3, 2021

Uh oh!

Uh oh!

Added AVX512 support for space_l2 and space_ip. #339

Added AVX512 support for space_l2 and space_ip. #339

Uh oh!

Conversation

slice4e commented Sep 9, 2021

Uh oh!

yurymalkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yurymalkov commented Sep 16, 2021

Uh oh!

slice4e commented Sep 16, 2021

Uh oh!

yurymalkov Sep 29, 2021

Choose a reason for hiding this comment

Uh oh!

slice4e Sep 29, 2021

Choose a reason for hiding this comment

Uh oh!

yurymalkov Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

slice4e Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

yurymalkov commented Oct 3, 2021

Uh oh!

Uh oh!