-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a subtle bug in DBSCAN noise marking #525
Conversation
Situation: a core point is connected to only boundary points. However, those boundary are connected to other core points. So it may happen that the boundary points are assigned to other clusters, resulting in a cluster with a single point (core). This test makes sure that we don't accidentally label it as noise, as its label is likely to be itself.
src/ArborX_DBSCAN.hpp
Outdated
if (is_special_case) | ||
{ | ||
// Cannot use CCSCore as it always returns true | ||
Kokkos::parallel_for("ArborX::dbscan::mark_noise", | ||
Kokkos::RangePolicy<ExecutionSpace>(exec_space, 0, n), | ||
KOKKOS_LAMBDA(int const i) { | ||
if (cluster_sizes(labels(i)) == 1) | ||
labels(i) = -1; | ||
}); | ||
} | ||
else | ||
{ | ||
DBSCAN::DBSCANCorePoints<MemorySpace> is_core{num_neigh, core_min_size}; | ||
Kokkos::parallel_for("ArborX::dbscan::mark_noise", | ||
Kokkos::RangePolicy<ExecutionSpace>(exec_space, 0, n), | ||
KOKKOS_LAMBDA(int const i) { | ||
if (cluster_sizes(labels(i)) == 1 && !is_core(i)) | ||
labels(i) = -1; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted that patch out of curiosity and it looks like the unit tests are still passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did the same. I think some permutation of the points in the test may trigger the failure in Serial, but it's really hard to get to it. It did fix the failure when verifying 500K subset of HACC points, run as
$ ./ArborX_DBSCAN.exe --binary --filename hacc.arborx --print-dbscan-timers --core-min-size 5 --cluster-min-size 1 --eps 0.042 --impl fdbscan-densebox --max-num-points 500000 --verify
and it also fixed divergence in the number of clusters and number of points in clusters when running HACC problem using different algorithms (as long as --cluster_min_size=1
was also specified).
CI failure is not a failure, but a warning from a container build. |
Situation: a core point is connected to only boundary points. However,
those boundary are connected to other core points. So it may happen that
the boundary points are assigned to other clusters, resulting in a
cluster with a single point (core).
The bug was that those single core clusters were marked as noise.
The fix is ugly as heck, but I don't know how to do better.