Introduce OrderedIntersection TreeTraversal #683

masterleinad · 2022-05-20T21:21:02Z

It makes sense to replace the current example since the new one covers more ground, but I could copy some more comments if necessary.
I decided to keep both variants since I find it instructional to see the performance difference.

~~The PriorityBasedTreeTraversal still needs to be moved.~~

masterleinad · 2022-06-01T21:06:37Z

I think this is in a good enough state to discuss now.

The example compares the new traversal that processes a callback in order of the distance between ray and box with an approach that first collects all intersections then sorts them and finally processes the callback. Typical speedups I observe are between 10x and 20x (also depending on sorting predicates which is sometimes beneficial).

The new tag is called OrderedNearestPredicateTag and is up for discussion. Currently, we always expect a callback and we process all matching (distance != inf) predicate/primitive pairs. This probably only makes sense for Rays at the moment since for all other cases the distance is finite and we likely don't want to process all primitives for all queries. I could see that we want to introduce a cut-off distance replacing inf.

src/details/ArborX_DetailsTreeTraversal.hpp

src/details/ArborX_Ray.hpp

dalg24 · 2022-06-02T15:50:02Z

src/details/ArborX_Ray.hpp

+template <typename Geometry>
+KOKKOS_INLINE_FUNCTION void
+overlapDistance(Ray const &ray, Geometry const &geometry, float &length,
+                float &distance_to_origin)


I was already not sure about exposing overlapDistance(Ray, Sphere)

I only kept adding it since overlapDistance(Ray, Sphere) was already there.
IMHO, it's a useful helper function for Rays. I don't have a problem moving it to the example, though.

masterleinad · 2022-06-03T19:42:38Z

Only pushing to the stack when necessary gives

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 9.45296 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 8.95e+00 sec 94.7% 99.8% 0.0% 0.0% 1.79e+00 1 second_approach [region]
|   |-> 8.39e+00 sec 88.7% 100.0% 0.0% ------ 1 batched_sorting [for]
|   |-> 4.92e-01 sec 5.2% 96.3% 0.0% 0.0% 2.64e+01 1 ArborX::query [region]
|   |   |-> 4.92e-01 sec 5.2% 96.3% 0.0% 0.1% 2.64e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.84e-01 sec 5.1% 97.7% 0.0% 0.2% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.27e-01 sec 3.5% 96.9% 0.0% 3.1% 9.18e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.16e-01 sec 3.3% 100.0% 0.0% 0.0% 3.16e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.16e-01 sec 3.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 1.6% 99.9% 0.0% 0.0% 6.71e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 1.6% 100.0% 0.0% 0.0% 6.71e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 1.6% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |-> 6.72e-02 sec 0.7% 100.0% 0.0% ------ 1 deposit_energy [for]
|-> 4.64e-01 sec 4.9% 98.6% 0.0% 0.0% 6.47e+00 1 first_approach [region]
|   |-> 4.64e-01 sec 4.9% 98.6% 0.0% 0.1% 6.47e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 4.57e-01 sec 4.8% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.69e-02 sec 0.2% 10.2% 0.0% 2.2% 4.14e+02 1 ArborX::BVH::BVH [region]
|   |-> 1.42e-02 sec 0.2% 9.3% 0.0% 91.2% 2.82e+02 1 ArborX::BVH::BVH::generate_hierarchy [region]
|-> 9.80e-03 sec 0.1% 48.7% 0.0% 0.2% 6.12e+02 1 problem_setup [region]

vs.

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 9.61558 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 8.98e+00 sec 93.4% 99.8% 0.0% 0.0% 1.78e+00 1 second_approach [region]
|   |-> 8.43e+00 sec 87.6% 100.0% 0.0% ------ 1 batched_sorting [for]
|   |-> 4.89e-01 sec 5.1% 96.5% 0.0% 0.0% 2.66e+01 1 ArborX::query [region]
|   |   |-> 4.89e-01 sec 5.1% 96.5% 0.0% 0.1% 2.66e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.81e-01 sec 5.0% 97.8% 0.0% 0.2% 1.87e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.24e-01 sec 3.4% 97.1% 0.0% 2.9% 9.25e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.15e-01 sec 3.3% 100.0% 0.0% 0.0% 3.18e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.15e-01 sec 3.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 1.5% 100.0% 0.0% 0.0% 6.72e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 1.5% 100.0% 0.0% 0.0% 6.72e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 1.5% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |-> 6.71e-02 sec 0.7% 100.0% 0.0% ------ 1 deposit_energy [for]
|-> 6.02e-01 sec 6.3% 99.0% 0.0% 0.0% 4.98e+00 1 first_approach [region]
|   |-> 6.02e-01 sec 6.3% 99.0% 0.0% 0.1% 4.98e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 5.96e-01 sec 6.2% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.27e-02 sec 0.1% 43.7% 0.0% 0.1% 4.72e+02 1 problem_setup [region]
|   |-> 1.08e-02 sec 0.1% 49.7% 0.0% 50.3% 4.65e+02 1 make_rays [region]

when running

./ArborX_RayTracing.exe --Nx=100 --Ny=100 --Nz=100 --rays=10

That's to say it's like 30% faster (4.64e-01 vs 6.02e-01).

masterleinad · 2022-06-06T21:45:01Z

Avoiding the batched sort, the profile looks more like

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 2.1512 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.66e+00 sec 77.3% 34.1% 0.0% 64.8% 9.63e+00 1 second_approach [region]
|   |-> 4.92e-01 sec 22.9% 96.3% 0.0% 0.0% 2.64e+01 1 ArborX::query [region]
|   |   |-> 4.92e-01 sec 22.9% 96.3% 0.0% 0.1% 2.64e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.84e-01 sec 22.5% 97.7% 0.0% 0.2% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.26e-01 sec 15.2% 96.9% 0.0% 3.0% 9.20e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.16e-01 sec 14.7% 100.0% 0.0% 0.0% 3.16e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.16e-01 sec 14.7% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 6.9% 99.9% 0.0% 0.0% 6.71e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 6.9% 99.9% 0.0% 0.1% 6.72e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 6.9% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 7.79e-03 sec 0.4% 97.6% 0.0% 2.4% 5.14e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 7.31e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |       |-> 6.31e-03 sec 0.3% 10.5% 0.0% 89.5% 3.17e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |-> 8.60e-02 sec 4.0% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.23e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 4.61e-01 sec 21.4% 98.6% 0.0% 0.0% 6.51e+00 1 first_approach [region]
|   |-> 4.61e-01 sec 21.4% 98.6% 0.0% 0.1% 6.51e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 4.54e-01 sec 21.1% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|       |-> 6.41e-03 sec 0.3% 9.4% 0.0% 90.6% 3.12e+02 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 9.73e-03 sec 0.5% 51.7% 0.0% 0.3% 6.17e+02 1 problem_setup [region]
|   |-> 8.76e-03 sec 0.4% 53.2% 0.0% 46.8% 5.71e+02 1 make_rays [region]
|   |   |-> 3.46e-03 sec 0.2% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 4.69e-03 sec 0.2% 36.3% 0.0% 7.8% 1.49e+03 1 ArborX::BVH::BVH [region]

so we are in total like 3.6x faster with the new approach (4.61e-01s vs 1.66s). Even just considering the traversals, the new approach is slightly after (4.61e-01s vs. 4.92e-01s).

Without sorting the predicates, we get

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 2.00338 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.61e+00 sec 80.3% 32.2% 0.0% 67.2% 8.71e+00 1 second_approach [region]
|   |-> 4.36e-01 sec 21.7% 97.5% 0.0% 0.0% 2.52e+01 1 ArborX::query [region]
|   |   |-> 4.36e-01 sec 21.7% 97.5% 0.0% 0.1% 2.52e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.35e-01 sec 21.7% 97.6% 0.0% 0.1% 2.07e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 2.94e-01 sec 14.7% 96.7% 0.0% 3.3% 1.02e+01 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 2.84e-01 sec 14.2% 100.0% 0.0% 0.0% 3.52e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 2.84e-01 sec 14.2% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.33e-01 sec 6.7% 99.9% 0.0% 0.0% 7.50e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.33e-01 sec 6.7% 99.9% 0.0% 0.1% 7.50e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.33e-01 sec 6.7% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 7.01e-03 sec 0.3% 97.1% 0.0% 2.9% 5.71e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 6.60e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |-> 8.62e-02 sec 4.3% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.21e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 3.65e-01 sec 18.2% 100.0% 0.0% 0.0% 2.74e+00 1 first_approach [region]
|   |-> 3.65e-01 sec 18.2% 100.0% 0.0% 0.0% 2.74e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 3.65e-01 sec 18.2% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.07e-02 sec 0.5% 45.7% 0.0% 0.3% 5.59e+02 1 problem_setup [region]
|   |-> 1.02e-02 sec 0.5% 46.0% 0.0% 54.0% 4.91e+02 1 make_rays [region]
|   |   |-> 3.47e-03 sec 0.2% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 5.89e-03 sec 0.3% 30.7% 0.0% 10.8% 1.19e+03 1 ArborX::BVH::BVH [region]
|   |-> 2.30e-03 sec 0.1% 3.1% 0.0% 96.9% 4.36e+02 1 ArborX::BVH::BVH::sort_linearized_order [region]
|   |-> 2.02e-03 sec 0.1% 66.1% 0.0% 37.2% 1.98e+03 1 ArborX::BVH::BVH::generate_hierarchy [region]

which is marginally better.

aprokop

This PR will likely have to be updated for #675.

src/details/ArborX_DetailsTreeTraversal.hpp

aprokop · 2022-06-07T17:37:07Z

src/details/ArborX_DetailsTreeTraversal.hpp

+        if (invoke_callback_and_check_early_exit(
+                _callback, predicate,
+                HappyTreeFriends::getLeafPermutationIndex(_bvh, node)))
+          return;


Guess it would still allow one to find just k nearest ones, while processing them in the distance closest order.

Yes, the question is whether we want it to rather behave like a spatial predicate or like a nearest predicate. I tend to nearest which is why I choose OrderedNearest for now.
Also, this would be one of the traversals raytracing algorithms commonly use (multi-hit).

I do agree that this has nearest flavor. We don't check predicate for traversing, only on the distance. Nearest though implies a specific order, so does not describe the situation well.

we do check predicates, every time we compare to infinity

src/details/ArborX_DetailsTreeTraversal.hpp

aprokop · 2022-06-07T17:43:11Z

src/details/ArborX_DetailsTreeTraversal.hpp

+      return distance(geometry, box);
+    };
+
+    using PairIndexDistance = Kokkos::pair<int, float>;


If we are trying to generalize this, the second argument should the type returned by the distance function. As it can be provided by a user, this will have to be determined and not hardcoded.

masterleinad · 2022-06-07T21:49:11Z

Using Nearest traversals for the benchmark using k=100 gives

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 3.09358 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.67e+00 sec 54.0% 33.9% 0.0% 64.5% 9.57e+00 1 second_approach [region]
|   |-> 5.01e-01 sec 16.2% 94.5% 0.0% 0.0% 2.59e+01 1 ArborX::query [region]
|   |   |-> 5.01e-01 sec 16.2% 94.5% 0.0% 1.5% 2.59e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.85e-01 sec 15.7% 97.5% 0.0% 0.1% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.32e-01 sec 10.7% 96.6% 0.0% 3.4% 9.03e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.21e-01 sec 10.4% 100.0% 0.0% 0.0% 3.12e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.21e-01 sec 10.4% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 4.8% 100.0% 0.0% 0.0% 6.70e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 4.8% 100.0% 0.0% 0.0% 6.70e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 4.8% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |-> 7.90e-03 sec 0.3% 7.4% 0.0% 92.6% 2.53e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |-> 8.62e-02 sec 2.8% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.19e-03 sec 0.2% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 1.38e+00 sec 44.5% 98.1% 0.0% 0.0% 5.81e+00 1 first_approach [region]
|   |-> 1.38e+00 sec 44.5% 98.1% 0.0% 1.5% 5.81e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 1.34e+00 sec 43.4% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::nearest [for]
|       |-> 6.79e-03 sec 0.2% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|       |-> 6.20e-03 sec 0.2% 9.4% 0.0% 90.6% 3.22e+02 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 1.81e-02 sec 0.6% 100.0% 0.0% ------ 1 compare [reduce]
|-> 9.69e-03 sec 0.3% 50.1% 0.0% 0.2% 6.19e+02 1 problem_setup [region]
|   |-> 9.14e-03 sec 0.3% 50.7% 0.0% 49.3% 5.47e+02 1 make_rays [region]
|   |   |-> 3.44e-03 sec 0.1% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 4.69e-03 sec 0.2% 36.4% 0.0% 7.6% 1.49e+03 1 ArborX::BVH::BVH [region]

In particular, this takes 1.38e+00s compared to the 4.61e-01s for the new traversal.

masterleinad · 2022-06-07T22:33:41Z

k	nearest	ordered nearest
1	2.77e-02	2.69e-02
4	4.51e-02	4.54e-02
16	1.60-e01	1.17e-01
64	7.61e-01	3.49e-01
100	1.38e+00	4.00e-01

masterleinad · 2022-06-07T22:42:24Z

Of course, we also don't need to allocate global memory on the device for the traversal with the approach here as opposed ti nearest traversals.

wjge · 2022-06-08T14:26:43Z

Do we want to change the name of the file 'example_raytracing.cpp' to be more specific? This example is a raytracing problem in absorbing media. It is more challenging in certain way, but might not be a general/typical problem for raytracing (e.g. between surfaces, scattering media, etc.).

src/details/ArborX_DetailsTreeTraversal.hpp

wjge · 2022-06-08T15:01:08Z

examples/raytracing/CMakeLists.txt

@@ -1,3 +1,3 @@
 add_executable(ArborX_RayTracing.exe example_raytracing.cpp)
 target_link_libraries(ArborX_RayTracing.exe ArborX::ArborX Boost::program_options)
-add_test(NAME ArborX_RayTracing_Example COMMAND ./ArborX_RayTracing.exe --spheres=1000 --rays=50000 --L=200.0)
+add_test(NAME ArborX_RayTracing_Example COMMAND ./ArborX_RayTracing.exe --Nx=10 --Ny=10 --Nz=10 --Lx=1 --Ly=100000 --Lz=100000 --rays=1000)


A more common GCD would be like 36x36x36, 48x48x48 for a 3-D problem.
Do you set this aspect ratio of the domain size for any purpose?

No, it's pretty random. Note that I use

./ArborX_RayTracing.exe --Nx=100 --Ny=100 --Nz=100 --rays=10

for all timings posted above.

100x100x100 might be too large.
Maybe you can keep the number of rays per cell, but reduce the size to 48x48x48 or 60x60x60.

For 60x60x60, the profiling looks like

TOTAL TIME: 0.253154 seconds TOP-DOWN TIME TREE: <average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type] =================== |-> 2.02e-01 sec 79.7% 37.0% 0.0% 60.3% 7.43e+01 1 second_approach [region] | |-> 6.93e-02 sec 27.4% 92.2% 0.0% 0.0% 1.73e+02 1 ArborX::query [region] | | |-> 6.93e-02 sec 27.4% 92.2% 0.0% 0.7% 1.73e+02 1 ArborX::CrsGraphWrapper::query::spatial [region] | | |-> 6.64e-02 sec 26.2% 96.0% 0.0% 1.0% 1.36e+02 1 ArborX::CrsGraphWrapper::two_pass [region] | | | |-> 4.11e-02 sec 16.2% 95.7% 0.0% 4.2% 7.30e+01 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region] | | | | |-> 3.93e-02 sec 15.5% 99.9% 0.0% 0.1% 2.54e+01 1 ArborX::BVH::query::spatial [region] | | | | | |-> 3.93e-02 sec 15.5% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for] | | | |-> 1.85e-02 sec 7.3% 99.8% 0.0% 0.1% 5.40e+01 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region] | | | | |-> 1.85e-02 sec 7.3% 99.9% 0.0% 0.1% 5.40e+01 1 ArborX::BVH::query::spatial [region] | | | | |-> 1.85e-02 sec 7.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for] | | | |-> 6.01e-03 sec 2.4% 97.3% 0.0% 2.7% 6.65e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region] | | | | |-> 5.75e-03 sec 2.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan] | | |-> 2.20e-03 sec 0.9% 7.2% 0.0% 92.8% 9.07e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region] | | |-> 2.70e-04 sec 0.1% 10.3% 0.0% 89.7% 3.70e+03 1 ArborX::CrsGraphWrapper::query::spatial::init_and_alloc [region] | |-> 9.95e-03 sec 3.9% 100.0% 0.0% ------ 1 deposit_energy [for] | |-> 8.45e-04 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for] |-> 3.98e-02 sec 15.7% 94.0% 0.0% 0.0% 7.54e+01 1 first_approach [region] | |-> 3.98e-02 sec 15.7% 94.0% 0.0% 0.6% 7.55e+01 1 ArborX::BVH::query::nearest [region] | |-> 3.72e-02 sec 14.7% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for] | |-> 2.31e-03 sec 0.9% 6.8% 0.0% 93.2% 8.66e+02 1 ArborX::BVH::query::nearest::compute_permutation [region] |-> 5.69e-03 sec 2.2% 31.4% 0.0% 0.3% 1.06e+03 1 problem_setup [region] | |-> 5.33e-03 sec 2.1% 31.6% 0.0% 68.4% 9.39e+02 1 make_rays [region] | | |-> 7.40e-04 sec 0.3% 100.0% 0.0% ------ 1 initialize_rays [for] | | |-> 5.57e-04 sec 0.2% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::state"="Kokkos::Random_XorShift64::state_mirror" [copy] | | |-> 3.37e-04 sec 0.1% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::locks"="Kokkos::Random_XorShift64::locks_mirror" [copy] | |-> 3.44e-04 sec 0.1% 29.2% 0.0% 70.8% 2.90e+03 1 make_grid [region] |-> 2.08e-03 sec 0.8% 25.1% 0.0% 14.0% 3.36e+03 1 ArborX::BVH::BVH [region] | |-> 1.02e-03 sec 0.4% 2.4% 0.0% 97.6% 9.83e+02 1 ArborX::BVH::BVH::sort_linearized_order [region] | |-> 4.30e-04 sec 0.2% 89.3% 0.0% 17.7% 9.31e+03 1 ArborX::BVH::BVH::generate_hierarchy [region] | | |-> 2.93e-04 sec 0.1% 100.0% 0.0% ------ 1 ArborX::TreeConstruction::generate_hierarchy [for]

which is pretty similar (3.98e-02s vs. 2.02e-01s).
For 48x48x48, we have

TOTAL TIME: 0.133111 seconds TOP-DOWN TIME TREE: <average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type] =================== |-> 1.01e-01 sec 76.1% 35.3% 0.0% 60.4% 1.48e+02 1 second_approach [region] | |-> 3.60e-02 sec 27.1% 88.0% 0.0% 0.0% 3.33e+02 1 ArborX::query [region] | | |-> 3.60e-02 sec 27.1% 88.0% 0.0% 1.2% 3.33e+02 1 ArborX::CrsGraphWrapper::query::spatial [region] | | |-> 3.30e-02 sec 24.8% 95.9% 0.0% 1.2% 2.73e+02 1 ArborX::CrsGraphWrapper::two_pass [region] | | | |-> 1.70e-02 sec 12.8% 95.3% 0.0% 4.5% 1.77e+02 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region] | | | | |-> 1.62e-02 sec 12.1% 99.8% 0.0% 0.2% 6.19e+01 1 ArborX::BVH::query::spatial [region] | | | | | |-> 1.61e-02 sec 12.1% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for] | | | |-> 9.60e-03 sec 7.2% 99.6% 0.0% 0.2% 1.04e+02 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region] | | | | |-> 9.59e-03 sec 7.2% 99.7% 0.0% 0.3% 1.04e+02 1 ArborX::BVH::query::spatial [region] | | | | |-> 9.56e-03 sec 7.2% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for] | | | |-> 5.94e-03 sec 4.5% 97.8% 0.0% 2.2% 6.73e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region] | | | | |-> 5.72e-03 sec 4.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan] | | |-> 2.41e-03 sec 1.8% 4.5% 0.0% 95.5% 8.31e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region] | | |-> 2.43e-04 sec 0.2% 6.2% 0.0% 93.8% 4.11e+03 1 ArborX::CrsGraphWrapper::query::spatial::init_and_alloc [region] | |-> 3.70e-03 sec 2.8% 100.0% 0.0% ------ 1 deposit_energy [for] | |-> 3.96e-04 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for] |-> 2.13e-02 sec 16.0% 91.1% 0.0% 0.0% 1.41e+02 1 first_approach [region] | |-> 2.13e-02 sec 16.0% 91.1% 0.0% 0.9% 1.41e+02 1 ArborX::BVH::query::nearest [region] | |-> 1.93e-02 sec 14.5% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for] | |-> 1.80e-03 sec 1.4% 6.1% 0.0% 93.9% 1.11e+03 1 ArborX::BVH::query::nearest::compute_permutation [region] |-> 5.38e-03 sec 4.0% 28.9% 0.0% 0.3% 1.12e+03 1 problem_setup [region] | |-> 4.81e-03 sec 3.6% 30.7% 0.0% 69.3% 1.04e+03 1 make_rays [region] | | |-> 6.76e-04 sec 0.5% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::state"="Kokkos::Random_XorShift64::state_mirror" [copy] | | |-> 4.09e-04 sec 0.3% 100.0% 0.0% ------ 1 initialize_rays [for] | | |-> 3.39e-04 sec 0.3% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::locks"="Kokkos::Random_XorShift64::locks_mirror" [copy] | |-> 5.57e-04 sec 0.4% 13.9% 0.0% 86.1% 1.79e+03 1 make_grid [region] |-> 1.77e-03 sec 1.3% 24.1% 0.0% 10.9% 3.96e+03 1 ArborX::BVH::BVH [region] | |-> 8.78e-04 sec 0.7% 2.4% 0.0% 97.6% 1.14e+03 1 ArborX::BVH::BVH::sort_linearized_order [region] | |-> 3.74e-04 sec 0.3% 82.9% 0.0% 24.9% 1.07e+04 1 ArborX::BVH::BVH::generate_hierarchy [region] | | |-> 2.21e-04 sec 0.2% 100.0% 0.0% ------ 1 ArborX::TreeConstruction::generate_hierarchy [for] | |-> 2.44e-04 sec 0.2% 13.2% 0.0% 86.8% 4.11e+03 1 ArborX::BVH::BVH::compute_linear_ordering [region]

meaning 2.13e-02s vs. 1.01e-01s.

dalg24 · 2022-06-10T19:30:45Z

test/tstQueryTreeRay.cpp

+template <typename DeviceType>
+struct AccessTraits<BoxesIntersectedByRayOrdered<DeviceType>,
+                    ArborX::PredicatesTag>


Avoid defining these away from their being used

dalg24 · 2022-06-10T19:32:08Z

test/tstQueryTreeRay.cpp

+  KOKKOS_FUNCTION void operator()(Predicate const &,
+                                  int const primitive_index) const
+  {
+    _ordered_intersections(_i++) = primitive_index;


The incrementing does not need atomicity?

There is only one predicate and every predicate is processed by a single thread so it's not really necessary to use atomics here.

this is a bug begging to happen

I mean it's still a variable local to the predicate even if there were more but sure I'll make it atomic, see 03e716f

dalg24 · 2022-06-10T19:34:46Z

test/tstQueryTreeRay.cpp

+  Kokkos::View<ArborX::Box *, DeviceType> device_boxes("boxes", 10);
+  Kokkos::deep_copy(exec_space, device_boxes,
+                    Kokkos::View<ArborX::Box *, Kokkos::HostSpace>(
+                        boxes.data(), boxes.size()));


ArborX/test/ArborXTest_StdVectorToKokkosView.hpp

Lines 23 to 24 in f60fc2c

template <typename DeviceType, typename T>

auto toView(std::vector<T> const &v, std::string const &lbl = "Test::Unnamed")

dalg24 · 2022-06-10T19:40:05Z

test/tstQueryTreeRay.cpp

+  auto const host_ordered_intersections = Kokkos::create_mirror_view_and_copy(
+      Kokkos::HostSpace{}, device_ordered_intersections);
+  for (unsigned int i = 0; i < n; ++i)
+    BOOST_TEST(host_ordered_intersections(i) == i);
+}


Take advantage of our Kokkos::View adapters and the Boost native support to compare collection of values (grep for per_element)
We try to keep boiler plate code minimal to focus on the test content.

I didn't because it's more concise this way IMHO but sure I'll change, see 03e716f

aprokop · 2022-06-14T17:14:23Z

examples/raytracing/example_raytracing.cpp

+  Kokkos::View<int *> offsets("offsets", 0);
+  bvh.query(exec_space, RaysSecond<MemorySpace>{rays},
+            AccumRaySphereOptDist<MemorySpace>{boxes}, values, offsets);
+  ArborX::Details::sortObjects(exec_space, values);


For a better comparison, one should create a vector of long long or double with each entry encoding a combination of key and rid. Then use it to produce a permutation. This would run about 2.5x faster than the sort on these structs with 4 values.

Using a struct with only the required variables for sorting, shaves off like 0.5s. For a total of 1.13s for the original approach (vs 0.42s for the new approach).

old new

sorting 1.08e+00s 4.45e-01s

deposit_energy 8.62e-02s 1.55e-01s

total 1.66se+00s 1.07e+00s

dalg24

Partial review

dalg24 · 2022-06-20T21:37:48Z

src/details/ArborX_Predicates.hpp

+{
+template <typename Geometry>
+KOKKOS_INLINE_FUNCTION OrderedIntersection<Geometry>
+ordered_nearest(Geometry const &geometry)


It needs to say "intersect"
I am tempted to define a tag Experimental::Ordered and define

intersects(Experimental::Ordered, Geometry const&)

dalg24 · 2022-06-20T21:38:46Z

src/details/ArborX_Predicates.hpp

@@ -22,6 +22,11 @@ struct NearestPredicateTag
 struct SpatialPredicateTag
 {};
 } // namespace Details
+namespace Experimental
+{
+struct OrderedIntersectionPredicateTag


Suggested change

struct OrderedIntersectionPredicateTag

struct OrderedSpatialPredicateTag

dalg24 · 2022-06-20T21:42:42Z

src/details/ArborX_Callbacks.hpp

-          std::is_same<PredicateTag, NearestPredicateTag>{},
+          std::is_same<PredicateTag, NearestPredicateTag>{} ||
+          std::is_same<PredicateTag,
+                       Experimental::OrderedIntersectionPredicateTag>{},


It does not make sense you are using invoke_callback_and_check_early_exit in the traversal.

src/details/ArborX_Predicates.hpp

dalg24

As I understand it, the changes to <ArborX_Ray.hpp> can be dropped from this PR

src/details/ArborX_AccessTraits.hpp

src/details/ArborX_Callbacks.hpp

src/details/ArborX_DetailsTreeTraversal.hpp

test/tstQueryTreeRay.cpp

examples/raytracing/example_raytracing.cpp

aprokop · 2022-06-27T18:01:20Z

test/tstQueryTreeRay.cpp

+  ArborX::Experimental::Ray ray{
+      ArborX::Point{0, 0, 0},
+      ArborX::Experimental::Vector{1. / n, 1. / n, 1. / n}};


Can we please expand this test by shooting a second ray from (n, n, n) with direction (-1/n, -1/n, -1/n)?

aprokop · 2022-06-27T18:06:07Z

src/details/ArborX_Callbacks.hpp

+      !(std::is_same<PredicateTag, SpatialPredicateTag>{} ||
+        std::is_same<PredicateTag,
+                     Experimental::OrderedSpatialPredicateTag>{}) ||


Shouldn't this be

Suggested change

!(std::is_same<PredicateTag, SpatialPredicateTag>{} ||

std::is_same<PredicateTag,

Experimental::OrderedSpatialPredicateTag>{}) ||

(std::is_same<PredicateTag, SpatialPredicateTag>{} ||

std::is_same<PredicateTag,

Experimental::OrderedSpatialPredicateTag>{}) &&

This says as much as if the predicate is spatial or ordered_spatial, the return type must be void or CallbackTreeTraversalControl.
The next one says: if the predicate is nearest, the return type must be void.

I think that I didn't change what we check (except for adding the new predicate, of course) but made both static_asserts more self-contained (in that we don't mention other predicates than the ones we want to impose a condition on).

Ah, you are right. It was a bit hard to read for me. This is an optional suggestion then.

aprokop · 2022-07-18T16:21:45Z

@dalg24 Do you have any outstanding concerns about this PR?

dalg24 · 2022-07-18T16:30:47Z

@dalg24 Do you have any outstanding concerns about this PR?

No

aprokop · 2022-07-18T17:17:56Z

@masterleinad Could you please clean up the history before merge?

masterleinad · 2022-07-18T17:36:37Z

@masterleinad Could you please clean up the history before merge?

Here you go!

masterleinad · 2022-07-18T19:07:02Z

Retest this please.

aprokop added the enhancement New feature or request label May 20, 2022

masterleinad force-pushed the raytracing branch from 7650764 to 0da2577 Compare June 1, 2022 20:58

masterleinad changed the title ~~New Raytracing example using PriorityBasedTreeTraversal~~ New Raytracing example using OrderedNearest TreeTraversal Jun 1, 2022

masterleinad marked this pull request as ready for review June 1, 2022 21:06

dalg24 reviewed Jun 2, 2022

View reviewed changes

masterleinad force-pushed the raytracing branch 3 times, most recently from c333180 to 8983d7d Compare June 3, 2022 21:28

aprokop reviewed Jun 7, 2022

View reviewed changes

dalg24 reviewed Jun 8, 2022

View reviewed changes

src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved

wjge reviewed Jun 8, 2022

View reviewed changes

dalg24 reviewed Jun 10, 2022

View reviewed changes

aprokop reviewed Jun 14, 2022

View reviewed changes

masterleinad force-pushed the raytracing branch from f47c3d5 to 248df98 Compare June 20, 2022 19:21

masterleinad mentioned this pull request Jun 20, 2022

New Raytracing example using OrderedIntersection TreeTraversal #691

Merged

masterleinad changed the title ~~New Raytracing example using OrderedNearest TreeTraversal~~ Introduce OrderedIntersection TreeTraversal Jun 20, 2022

dalg24 reviewed Jun 20, 2022

View reviewed changes

dalg24 reviewed Jun 21, 2022

View reviewed changes

src/details/ArborX_AccessTraits.hpp Outdated Show resolved Hide resolved

src/details/ArborX_Callbacks.hpp Outdated Show resolved Hide resolved

dalg24 reviewed Jun 21, 2022

View reviewed changes

src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved

dalg24 reviewed Jun 21, 2022

View reviewed changes

test/tstQueryTreeRay.cpp Show resolved Hide resolved

masterleinad force-pushed the raytracing branch from c359981 to 17cf44c Compare June 21, 2022 18:40

aprokop reviewed Jun 27, 2022

View reviewed changes

masterleinad force-pushed the raytracing branch from 0306ecb to a4e33e3 Compare June 27, 2022 20:17

aprokop approved these changes Jun 29, 2022

View reviewed changes

Introduce OrderedSpatial tree traversal

3da258e

masterleinad force-pushed the raytracing branch from 8510190 to 3da258e Compare July 18, 2022 17:36

aprokop merged commit f2585f0 into arborx:master Jul 18, 2022

aprokop deleted the raytracing branch July 18, 2022 22:55

dalg24 mentioned this pull request Jul 19, 2022

Fixup for ordered spatial queries #695

Merged

masterleinad mentioned this pull request Dec 12, 2022

Use popPush instead of pop followed by push #790

Closed

	template <typename DeviceType, typename T>
	auto toView(std::vector<T> const &v, std::string const &lbl = "Test::Unnamed")

	old	new
sorting	1.08e+00s	4.45e-01s
deposit_energy	8.62e-02s	1.55e-01s
total	1.66se+00s	1.07e+00s

	struct OrderedIntersectionPredicateTag
	struct OrderedSpatialPredicateTag

Introduce OrderedIntersection TreeTraversal #683

Introduce OrderedIntersection TreeTraversal #683

Conversation

masterleinad commented May 20, 2022 • edited Loading

masterleinad commented Jun 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Jun 3, 2022 • edited Loading

masterleinad commented Jun 6, 2022

aprokop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Jun 7, 2022

masterleinad commented Jun 7, 2022

masterleinad commented Jun 7, 2022

wjge commented Jun 8, 2022

wjge Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjge Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

dalg24 Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aprokop commented Jul 18, 2022

dalg24 commented Jul 18, 2022

aprokop commented Jul 18, 2022

masterleinad commented Jul 18, 2022

masterleinad commented Jul 18, 2022

masterleinad commented May 20, 2022 •

edited

Loading

masterleinad commented Jun 3, 2022 •

edited

Loading

wjge Jun 8, 2022 •

edited

Loading

wjge Jun 8, 2022 •

edited

Loading

dalg24 Jun 20, 2022 •

edited

Loading