-
Notifications
You must be signed in to change notification settings - Fork 146
SNMG ANN build with OpenMP nested parallelism #1526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/25.12
Are you sure you want to change the base?
SNMG ANN build with OpenMP nested parallelism #1526
Conversation
cpp/src/neighbors/mg/snmg.cuh
Outdated
| } | ||
|
|
||
| // Restore original thread count | ||
| cuvs::core::omp::set_num_threads(saved_omp_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this restoration is a bit confusing. the pool of threads is set inside the for loop above, but its restored after the for loop? Can you verify if this logic is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is actually not necessary. My worry was that the main thread could be one of the threads in the loop. But, after checking it appears like OpenMP is designed in such a way that I made sure that each thread sees its number of thread restored back to the global maximal number of threads within the parallel region.omp_set_num_threads calls inside of a parallel region only affect how many threads each thread can use in nested parallel region. The main thread is unaffected, and its number of threads is actually preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added the same OpenMP usage for the extend function.
The SNMG ANN build spawns threads with
#pragma omp parallel for, but CAGRA build also uses OpenMP internally. With nested parallelism disabled by default, CAGRA's internal parallel regions ran serially on a single thread, causing severe performance degradation.omp_set_nested(1)num_ranksthreads (one per GPU)threads_per_rankfor CAGRA's internal parallelism