Skip to content

Added AVX512 support for space_l2 and space_ip. #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 3, 2021

Conversation

slice4e
Copy link
Contributor

@slice4e slice4e commented Sep 9, 2021

Add AVX512 support to L2 and IP algorithms.

Testing with Ann-Benchmarks shows measurable improvement on c5.4xlarge and m6i.4xlarge instances.

c5.4xlarge results
fashion-mnist-784-euclidean
fashion-mnist-784-euclidean_10_euclidean
mnist-784-euclidean
mnist-784-euclidean_10_euclidean
nytimes-256-angular
nytimes-256-angular_10_angular
sift-128-euclidean
sift-128-euclidean_10_euclidean

m6i.4xlarge results:
fashion-mnist-784-euclidean
fashion-mnist-784-euclidean_10_euclidean
mnist-784-euclidean
mnist-784-euclidean_10_euclidean
nytimes-256-angular
nytimes-256-angular_10_angular
sift-128-euclidean
sift-128-euclidean_10_euclidean

Copy link
Member

@yurymalkov yurymalkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you so much for the PR!
It seems like there is #endif missing. Can you please check?
Also, I wonder if you've tested it with sustained multithreaded load? I heard AVX512 can cause cloud instances to overheat and throttle, so they can even perform worse.

@yurymalkov
Copy link
Member

I've tried to run the code and noticed that __AVX512__ is not defined in gcc with -march=native (there are __AVX512BW__, __AVX512F__, __AVX512VNNI__ and many others, probably corresponding to different subsets).
I wonder, how do you test the code? Can you please add instructions on how to compile with AVX512 support?

…le time. Check for the __AVX512F__ define. All the AVX512 instructions that we are using are part of the Foundation set.
…, using PORTABLE_ALIGN32. However, we need to align it to 64 bytes becase of the use of unaligned store instruction _mm512_store_ps .
@slice4e
Copy link
Contributor Author

slice4e commented Sep 16, 2021

I've tried to run the code and noticed that __AVX512__ is not defined in gcc with -march=native (there are __AVX512BW__, __AVX512F__, __AVX512VNNI__ and many others, probably corresponding to different subsets).
I wonder, how do you test the code? Can you please add instructions on how to compile with AVX512 support?

Hi - Sorry, this was a mistake on my part. For some reason, I did not realize that we are checking for the presence of AVX/AVX2/etc at compile time, and I assumed that the application using hnswlib is setting the defines. In highsight, that does not make sense...

I fixed it now. We are now checking for AVX512F (which is the foundational set of AVX512 instructions.). All the instructions that we are using are part of AVX512F
For my performance testing, I had simply hardcoded the #define USE_AVX512 ... in the AVX512 tests.

In addition, I fixed a bug which was causing a crash on some system due to not aligning the memory correctly for the store instruction (_mm512_store_ps). This instruction requires memory to be aligned to 64 bytes.


// Favor using AVX512 if available.
static float
L2SqrSIMD16Ext(float *pVect1, float *pVect2, size_t qty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a long response. It seems float type here obstructs the compilation.
It should be void *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a long response. It seems float type here obstructs the compilation. It should be void *

Thanks for catching this. I have fixed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried and there other errors with constant void * to void * conversion (I guess it should be made const void * down the stack).
Can you please fix it and test it with pip install . and python -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py" (to run the tests) on a linux system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help! I fixed those errors. I was able to compile it and test it without errors following your instructions.

@yurymalkov yurymalkov merged commit ff10e88 into nmslib:develop Oct 3, 2021
@yurymalkov
Copy link
Member

Thanks for the PR!!

@slice4e slice4e deleted the develop branch October 4, 2021 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants