-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of knn traversal #357
Conversation
#neighbors = 10
#neighbors = 1
So, it seems that for Serial it is quite important to check the nodes that one gets from stack for the distance, and not do extra work. For Cuda, though, it seems preferential to not do that. |
Host runs a lot faster with (node, distance) stack, while Cuda without. Checking the distance really matters for small values of k, like nearest neighbor.
Summit results (master e8bdf3a vs branch dd0bf48): #neighbors = 1
#neighbors = 10
#neighbors = 100
|
OK, I'm reasonably happy with this PR now. Cuda is much faster, even for k = 1. Serial is about the same. The only downside is a slight increase for OpenMP for low k. An interesting observation: POWER9 showed no difference, but my Intel workstation significantly improved. We should add non-POWER9 performance testing to our toolbox (Intel or AMD)performance. Maybe ALCF?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity. Did you try to play with the alignment of the underlying C-array before you went ahead and split the stack?
No. You think that 4-byte alignment could be worse than 8-byte? |
I meant alignas(16) PairNodePtrDistance stack[64]; |
I see. I thought you meant adding padding to In any case, there is a lot of tinkering that could be done in this PR. However, as any tinkering would require rerunning many performance tests, I would like to keep it to minimum, and only for strictly necessary things. |
heap.popPush(leaf_pair); | ||
if ((int)heap.size() == k) | ||
radius = heap.top().second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication of code is unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was really annoyed by the duplication. The only way to clean it up would be to to move part of the logic inside heap itself, but it's another can of worms.
No description provided.