Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce OrderedIntersection TreeTraversal #683

Merged
merged 1 commit into from
Jul 18, 2022

Conversation

masterleinad
Copy link
Collaborator

@masterleinad masterleinad commented May 20, 2022

It makes sense to replace the current example since the new one covers more ground, but I could copy some more comments if necessary.
I decided to keep both variants since I find it instructional to see the performance difference.

The PriorityBasedTreeTraversal still needs to be moved.

@aprokop aprokop added the enhancement New feature or request label May 20, 2022
@masterleinad masterleinad changed the title New Raytracing example using PriorityBasedTreeTraversal New Raytracing example using OrderedNearest TreeTraversal Jun 1, 2022
@masterleinad
Copy link
Collaborator Author

I think this is in a good enough state to discuss now.

The example compares the new traversal that processes a callback in order of the distance between ray and box with an approach that first collects all intersections then sorts them and finally processes the callback. Typical speedups I observe are between 10x and 20x (also depending on sorting predicates which is sometimes beneficial).

The new tag is called OrderedNearestPredicateTag and is up for discussion. Currently, we always expect a callback and we process all matching (distance != inf) predicate/primitive pairs. This probably only makes sense for Rays at the moment since for all other cases the distance is finite and we likely don't want to process all primitives for all queries. I could see that we want to introduce a cut-off distance replacing inf.

@masterleinad masterleinad marked this pull request as ready for review June 1, 2022 21:06
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_Ray.hpp Outdated Show resolved Hide resolved
Comment on lines 537 to 540
template <typename Geometry>
KOKKOS_INLINE_FUNCTION void
overlapDistance(Ray const &ray, Geometry const &geometry, float &length,
float &distance_to_origin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was already not sure about exposing overlapDistance(Ray, Sphere)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only kept adding it since overlapDistance(Ray, Sphere) was already there.
IMHO, it's a useful helper function for Rays. I don't have a problem moving it to the example, though.

@masterleinad
Copy link
Collaborator Author

masterleinad commented Jun 3, 2022

Only pushing to the stack when necessary gives

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 9.45296 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 8.95e+00 sec 94.7% 99.8% 0.0% 0.0% 1.79e+00 1 second_approach [region]
|   |-> 8.39e+00 sec 88.7% 100.0% 0.0% ------ 1 batched_sorting [for]
|   |-> 4.92e-01 sec 5.2% 96.3% 0.0% 0.0% 2.64e+01 1 ArborX::query [region]
|   |   |-> 4.92e-01 sec 5.2% 96.3% 0.0% 0.1% 2.64e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.84e-01 sec 5.1% 97.7% 0.0% 0.2% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.27e-01 sec 3.5% 96.9% 0.0% 3.1% 9.18e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.16e-01 sec 3.3% 100.0% 0.0% 0.0% 3.16e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.16e-01 sec 3.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 1.6% 99.9% 0.0% 0.0% 6.71e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 1.6% 100.0% 0.0% 0.0% 6.71e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 1.6% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |-> 6.72e-02 sec 0.7% 100.0% 0.0% ------ 1 deposit_energy [for]
|-> 4.64e-01 sec 4.9% 98.6% 0.0% 0.0% 6.47e+00 1 first_approach [region]
|   |-> 4.64e-01 sec 4.9% 98.6% 0.0% 0.1% 6.47e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 4.57e-01 sec 4.8% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.69e-02 sec 0.2% 10.2% 0.0% 2.2% 4.14e+02 1 ArborX::BVH::BVH [region]
|   |-> 1.42e-02 sec 0.2% 9.3% 0.0% 91.2% 2.82e+02 1 ArborX::BVH::BVH::generate_hierarchy [region]
|-> 9.80e-03 sec 0.1% 48.7% 0.0% 0.2% 6.12e+02 1 problem_setup [region]

vs.

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 9.61558 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 8.98e+00 sec 93.4% 99.8% 0.0% 0.0% 1.78e+00 1 second_approach [region]
|   |-> 8.43e+00 sec 87.6% 100.0% 0.0% ------ 1 batched_sorting [for]
|   |-> 4.89e-01 sec 5.1% 96.5% 0.0% 0.0% 2.66e+01 1 ArborX::query [region]
|   |   |-> 4.89e-01 sec 5.1% 96.5% 0.0% 0.1% 2.66e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.81e-01 sec 5.0% 97.8% 0.0% 0.2% 1.87e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.24e-01 sec 3.4% 97.1% 0.0% 2.9% 9.25e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.15e-01 sec 3.3% 100.0% 0.0% 0.0% 3.18e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.15e-01 sec 3.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 1.5% 100.0% 0.0% 0.0% 6.72e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 1.5% 100.0% 0.0% 0.0% 6.72e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 1.5% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |-> 6.71e-02 sec 0.7% 100.0% 0.0% ------ 1 deposit_energy [for]
|-> 6.02e-01 sec 6.3% 99.0% 0.0% 0.0% 4.98e+00 1 first_approach [region]
|   |-> 6.02e-01 sec 6.3% 99.0% 0.0% 0.1% 4.98e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 5.96e-01 sec 6.2% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.27e-02 sec 0.1% 43.7% 0.0% 0.1% 4.72e+02 1 problem_setup [region]
|   |-> 1.08e-02 sec 0.1% 49.7% 0.0% 50.3% 4.65e+02 1 make_rays [region]

when running

./ArborX_RayTracing.exe --Nx=100 --Ny=100 --Nz=100 --rays=10

That's to say it's like 30% faster (4.64e-01 vs 6.02e-01).

@masterleinad masterleinad force-pushed the raytracing branch 3 times, most recently from c333180 to 8983d7d Compare June 3, 2022 21:28
@masterleinad
Copy link
Collaborator Author

Avoiding the batched sort, the profile looks more like

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 2.1512 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.66e+00 sec 77.3% 34.1% 0.0% 64.8% 9.63e+00 1 second_approach [region]
|   |-> 4.92e-01 sec 22.9% 96.3% 0.0% 0.0% 2.64e+01 1 ArborX::query [region]
|   |   |-> 4.92e-01 sec 22.9% 96.3% 0.0% 0.1% 2.64e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.84e-01 sec 22.5% 97.7% 0.0% 0.2% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.26e-01 sec 15.2% 96.9% 0.0% 3.0% 9.20e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.16e-01 sec 14.7% 100.0% 0.0% 0.0% 3.16e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.16e-01 sec 14.7% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 6.9% 99.9% 0.0% 0.0% 6.71e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 6.9% 99.9% 0.0% 0.1% 6.72e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 6.9% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 7.79e-03 sec 0.4% 97.6% 0.0% 2.4% 5.14e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 7.31e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |       |-> 6.31e-03 sec 0.3% 10.5% 0.0% 89.5% 3.17e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |-> 8.60e-02 sec 4.0% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.23e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 4.61e-01 sec 21.4% 98.6% 0.0% 0.0% 6.51e+00 1 first_approach [region]
|   |-> 4.61e-01 sec 21.4% 98.6% 0.0% 0.1% 6.51e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 4.54e-01 sec 21.1% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|       |-> 6.41e-03 sec 0.3% 9.4% 0.0% 90.6% 3.12e+02 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 9.73e-03 sec 0.5% 51.7% 0.0% 0.3% 6.17e+02 1 problem_setup [region]
|   |-> 8.76e-03 sec 0.4% 53.2% 0.0% 46.8% 5.71e+02 1 make_rays [region]
|   |   |-> 3.46e-03 sec 0.2% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 4.69e-03 sec 0.2% 36.3% 0.0% 7.8% 1.49e+03 1 ArborX::BVH::BVH [region]

so we are in total like 3.6x faster with the new approach (4.61e-01s vs 1.66s). Even just considering the traversals, the new approach is slightly after (4.61e-01s vs. 4.92e-01s).

Without sorting the predicates, we get

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 2.00338 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.61e+00 sec 80.3% 32.2% 0.0% 67.2% 8.71e+00 1 second_approach [region]
|   |-> 4.36e-01 sec 21.7% 97.5% 0.0% 0.0% 2.52e+01 1 ArborX::query [region]
|   |   |-> 4.36e-01 sec 21.7% 97.5% 0.0% 0.1% 2.52e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.35e-01 sec 21.7% 97.6% 0.0% 0.1% 2.07e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 2.94e-01 sec 14.7% 96.7% 0.0% 3.3% 1.02e+01 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 2.84e-01 sec 14.2% 100.0% 0.0% 0.0% 3.52e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 2.84e-01 sec 14.2% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.33e-01 sec 6.7% 99.9% 0.0% 0.0% 7.50e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.33e-01 sec 6.7% 99.9% 0.0% 0.1% 7.50e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.33e-01 sec 6.7% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 7.01e-03 sec 0.3% 97.1% 0.0% 2.9% 5.71e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 6.60e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |-> 8.62e-02 sec 4.3% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.21e-03 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 3.65e-01 sec 18.2% 100.0% 0.0% 0.0% 2.74e+00 1 first_approach [region]
|   |-> 3.65e-01 sec 18.2% 100.0% 0.0% 0.0% 2.74e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 3.65e-01 sec 18.2% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|-> 1.07e-02 sec 0.5% 45.7% 0.0% 0.3% 5.59e+02 1 problem_setup [region]
|   |-> 1.02e-02 sec 0.5% 46.0% 0.0% 54.0% 4.91e+02 1 make_rays [region]
|   |   |-> 3.47e-03 sec 0.2% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 5.89e-03 sec 0.3% 30.7% 0.0% 10.8% 1.19e+03 1 ArborX::BVH::BVH [region]
|   |-> 2.30e-03 sec 0.1% 3.1% 0.0% 96.9% 4.36e+02 1 ArborX::BVH::BVH::sort_linearized_order [region]
|   |-> 2.02e-03 sec 0.1% 66.1% 0.0% 37.2% 1.98e+03 1 ArborX::BVH::BVH::generate_hierarchy [region]

which is marginally better.

Copy link
Contributor

@aprokop aprokop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will likely have to be updated for #675.

src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
Comment on lines +544 to +555
if (invoke_callback_and_check_early_exit(
_callback, predicate,
HappyTreeFriends::getLeafPermutationIndex(_bvh, node)))
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it would still allow one to find just k nearest ones, while processing them in the distance closest order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the question is whether we want it to rather behave like a spatial predicate or like a nearest predicate. I tend to nearest which is why I choose OrderedNearest for now.
Also, this would be one of the traversals raytracing algorithms commonly use (multi-hit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that this has nearest flavor. We don't check predicate for traversing, only on the distance. Nearest though implies a specific order, so does not describe the situation well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do check predicates, every time we compare to infinity

src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Show resolved Hide resolved
return distance(geometry, box);
};

using PairIndexDistance = Kokkos::pair<int, float>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are trying to generalize this, the second argument should the type returned by the distance function. As it can be provided by a user, this will have to be determined and not hardcoded.

@masterleinad
Copy link
Collaborator Author

Using Nearest traversals for the benchmark using k=100 gives

BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 3.09358 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.67e+00 sec 54.0% 33.9% 0.0% 64.5% 9.57e+00 1 second_approach [region]
|   |-> 5.01e-01 sec 16.2% 94.5% 0.0% 0.0% 2.59e+01 1 ArborX::query [region]
|   |   |-> 5.01e-01 sec 16.2% 94.5% 0.0% 1.5% 2.59e+01 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 4.85e-01 sec 15.7% 97.5% 0.0% 0.1% 1.86e+01 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 3.32e-01 sec 10.7% 96.6% 0.0% 3.4% 9.03e+00 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.21e-01 sec 10.4% 100.0% 0.0% 0.0% 3.12e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.21e-01 sec 10.4% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.49e-01 sec 4.8% 100.0% 0.0% 0.0% 6.70e+00 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.49e-01 sec 4.8% 100.0% 0.0% 0.0% 6.70e+00 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.49e-01 sec 4.8% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |-> 7.90e-03 sec 0.3% 7.4% 0.0% 92.6% 2.53e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |-> 8.62e-02 sec 2.8% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 6.19e-03 sec 0.2% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 1.38e+00 sec 44.5% 98.1% 0.0% 0.0% 5.81e+00 1 first_approach [region]
|   |-> 1.38e+00 sec 44.5% 98.1% 0.0% 1.5% 5.81e+00 1 ArborX::BVH::query::nearest [region]
|       |-> 1.34e+00 sec 43.4% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::nearest [for]
|       |-> 6.79e-03 sec 0.2% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|       |-> 6.20e-03 sec 0.2% 9.4% 0.0% 90.6% 3.22e+02 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 1.81e-02 sec 0.6% 100.0% 0.0% ------ 1 compare [reduce]
|-> 9.69e-03 sec 0.3% 50.1% 0.0% 0.2% 6.19e+02 1 problem_setup [region]
|   |-> 9.14e-03 sec 0.3% 50.7% 0.0% 49.3% 5.47e+02 1 make_rays [region]
|   |   |-> 3.44e-03 sec 0.1% 100.0% 0.0% ------ 1 initialize_rays [for]
|-> 4.69e-03 sec 0.2% 36.4% 0.0% 7.6% 1.49e+03 1 ArborX::BVH::BVH [region]

In particular, this takes 1.38e+00s compared to the 4.61e-01s for the new traversal.

@masterleinad
Copy link
Collaborator Author

k nearest ordered nearest
1 2.77e-02 2.69e-02
4 4.51e-02 4.54e-02
16 1.60-e01 1.17e-01
64 7.61e-01 3.49e-01
100 1.38e+00 4.00e-01

@masterleinad
Copy link
Collaborator Author

Of course, we also don't need to allocate global memory on the device for the traversal with the approach here as opposed ti nearest traversals.

@wjge
Copy link
Contributor

wjge commented Jun 8, 2022

Do we want to change the name of the file 'example_raytracing.cpp' to be more specific? This example is a raytracing problem in absorbing media. It is more challenging in certain way, but might not be a general/typical problem for raytracing (e.g. between surfaces, scattering media, etc.).

@@ -1,3 +1,3 @@
add_executable(ArborX_RayTracing.exe example_raytracing.cpp)
target_link_libraries(ArborX_RayTracing.exe ArborX::ArborX Boost::program_options)
add_test(NAME ArborX_RayTracing_Example COMMAND ./ArborX_RayTracing.exe --spheres=1000 --rays=50000 --L=200.0)
add_test(NAME ArborX_RayTracing_Example COMMAND ./ArborX_RayTracing.exe --Nx=10 --Ny=10 --Nz=10 --Lx=1 --Ly=100000 --Lz=100000 --rays=1000)
Copy link
Contributor

@wjge wjge Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more common GCD would be like 36x36x36, 48x48x48 for a 3-D problem.
Do you set this aspect ratio of the domain size for any purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's pretty random. Note that I use

./ArborX_RayTracing.exe --Nx=100 --Ny=100 --Nz=100 --rays=10

for all timings posted above.

Copy link
Contributor

@wjge wjge Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100x100x100 might be too large.
Maybe you can keep the number of rays per cell, but reduce the size to 48x48x48 or 60x60x60.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 60x60x60, the profiling looks like

TOTAL TIME: 0.253154 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 2.02e-01 sec 79.7% 37.0% 0.0% 60.3% 7.43e+01 1 second_approach [region]
|   |-> 6.93e-02 sec 27.4% 92.2% 0.0% 0.0% 1.73e+02 1 ArborX::query [region]
|   |   |-> 6.93e-02 sec 27.4% 92.2% 0.0% 0.7% 1.73e+02 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 6.64e-02 sec 26.2% 96.0% 0.0% 1.0% 1.36e+02 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 4.11e-02 sec 16.2% 95.7% 0.0% 4.2% 7.30e+01 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 3.93e-02 sec 15.5% 99.9% 0.0% 0.1% 2.54e+01 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 3.93e-02 sec 15.5% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 1.85e-02 sec 7.3% 99.8% 0.0% 0.1% 5.40e+01 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 1.85e-02 sec 7.3% 99.9% 0.0% 0.1% 5.40e+01 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 1.85e-02 sec 7.3% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 6.01e-03 sec 2.4% 97.3% 0.0% 2.7% 6.65e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 5.75e-03 sec 2.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |       |-> 2.20e-03 sec 0.9% 7.2% 0.0% 92.8% 9.07e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |       |-> 2.70e-04 sec 0.1% 10.3% 0.0% 89.7% 3.70e+03 1 ArborX::CrsGraphWrapper::query::spatial::init_and_alloc [region]
|   |-> 9.95e-03 sec 3.9% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 8.45e-04 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 3.98e-02 sec 15.7% 94.0% 0.0% 0.0% 7.54e+01 1 first_approach [region]
|   |-> 3.98e-02 sec 15.7% 94.0% 0.0% 0.6% 7.55e+01 1 ArborX::BVH::query::nearest [region]
|       |-> 3.72e-02 sec 14.7% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|       |-> 2.31e-03 sec 0.9% 6.8% 0.0% 93.2% 8.66e+02 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 5.69e-03 sec 2.2% 31.4% 0.0% 0.3% 1.06e+03 1 problem_setup [region]
|   |-> 5.33e-03 sec 2.1% 31.6% 0.0% 68.4% 9.39e+02 1 make_rays [region]
|   |   |-> 7.40e-04 sec 0.3% 100.0% 0.0% ------ 1 initialize_rays [for]
|   |   |-> 5.57e-04 sec 0.2% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::state"="Kokkos::Random_XorShift64::state_mirror" [copy]
|   |   |-> 3.37e-04 sec 0.1% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::locks"="Kokkos::Random_XorShift64::locks_mirror" [copy]
|   |-> 3.44e-04 sec 0.1% 29.2% 0.0% 70.8% 2.90e+03 1 make_grid [region]
|-> 2.08e-03 sec 0.8% 25.1% 0.0% 14.0% 3.36e+03 1 ArborX::BVH::BVH [region]
|   |-> 1.02e-03 sec 0.4% 2.4% 0.0% 97.6% 9.83e+02 1 ArborX::BVH::BVH::sort_linearized_order [region]
|   |-> 4.30e-04 sec 0.2% 89.3% 0.0% 17.7% 9.31e+03 1 ArborX::BVH::BVH::generate_hierarchy [region]
|   |   |-> 2.93e-04 sec 0.1% 100.0% 0.0% ------ 1 ArborX::TreeConstruction::generate_hierarchy [for]

which is pretty similar (3.98e-02s vs. 2.02e-01s).
For 48x48x48, we have

TOTAL TIME: 0.133111 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.01e-01 sec 76.1% 35.3% 0.0% 60.4% 1.48e+02 1 second_approach [region]
|   |-> 3.60e-02 sec 27.1% 88.0% 0.0% 0.0% 3.33e+02 1 ArborX::query [region]
|   |   |-> 3.60e-02 sec 27.1% 88.0% 0.0% 1.2% 3.33e+02 1 ArborX::CrsGraphWrapper::query::spatial [region]
|   |       |-> 3.30e-02 sec 24.8% 95.9% 0.0% 1.2% 2.73e+02 1 ArborX::CrsGraphWrapper::two_pass [region]
|   |       |   |-> 1.70e-02 sec 12.8% 95.3% 0.0% 4.5% 1.77e+02 1 ArborX::CrsGraphWrapper::two_pass:second_pass [region]
|   |       |   |   |-> 1.62e-02 sec 12.1% 99.8% 0.0% 0.2% 6.19e+01 1 ArborX::BVH::query::spatial [region]
|   |       |   |   |   |-> 1.61e-02 sec 12.1% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 9.60e-03 sec 7.2% 99.6% 0.0% 0.2% 1.04e+02 1 ArborX::CrsGraphWrapper::two_pass::first_pass [region]
|   |       |   |   |-> 9.59e-03 sec 7.2% 99.7% 0.0% 0.3% 1.04e+02 1 ArborX::BVH::query::spatial [region]
|   |       |   |       |-> 9.56e-03 sec 7.2% 100.0% 0.0% ------ 1 ArborX::TreeTraversal::spatial [for]
|   |       |   |-> 5.94e-03 sec 4.5% 97.8% 0.0% 2.2% 6.73e+02 1 ArborX::CrsGraphWrapper::first_pass_postprocess [region]
|   |       |   |   |-> 5.72e-03 sec 4.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::exclusive_scan [scan]
|   |       |-> 2.41e-03 sec 1.8% 4.5% 0.0% 95.5% 8.31e+02 1 ArborX::CrsGraphWrapper::query::spatial::compute_permutation [region]
|   |       |-> 2.43e-04 sec 0.2% 6.2% 0.0% 93.8% 4.11e+03 1 ArborX::CrsGraphWrapper::query::spatial::init_and_alloc [region]
|   |-> 3.70e-03 sec 2.8% 100.0% 0.0% ------ 1 deposit_energy [for]
|   |-> 3.96e-04 sec 0.3% 100.0% 0.0% ------ 1 ArborX::Algorithms::iota [for]
|-> 2.13e-02 sec 16.0% 91.1% 0.0% 0.0% 1.41e+02 1 first_approach [region]
|   |-> 2.13e-02 sec 16.0% 91.1% 0.0% 0.9% 1.41e+02 1 ArborX::BVH::query::nearest [region]
|       |-> 1.93e-02 sec 14.5% 100.0% 0.0% ------ 1 ArborX::Experimental::TreeTraversal::OrderedNearestPredicate [for]
|       |-> 1.80e-03 sec 1.4% 6.1% 0.0% 93.9% 1.11e+03 1 ArborX::BVH::query::nearest::compute_permutation [region]
|-> 5.38e-03 sec 4.0% 28.9% 0.0% 0.3% 1.12e+03 1 problem_setup [region]
|   |-> 4.81e-03 sec 3.6% 30.7% 0.0% 69.3% 1.04e+03 1 make_rays [region]
|   |   |-> 6.76e-04 sec 0.5% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::state"="Kokkos::Random_XorShift64::state_mirror" [copy]
|   |   |-> 4.09e-04 sec 0.3% 100.0% 0.0% ------ 1 initialize_rays [for]
|   |   |-> 3.39e-04 sec 0.3% 100.0% 0.0% ------ 1 "Kokkos::Random_XorShift64::locks"="Kokkos::Random_XorShift64::locks_mirror" [copy]
|   |-> 5.57e-04 sec 0.4% 13.9% 0.0% 86.1% 1.79e+03 1 make_grid [region]
|-> 1.77e-03 sec 1.3% 24.1% 0.0% 10.9% 3.96e+03 1 ArborX::BVH::BVH [region]
|   |-> 8.78e-04 sec 0.7% 2.4% 0.0% 97.6% 1.14e+03 1 ArborX::BVH::BVH::sort_linearized_order [region]
|   |-> 3.74e-04 sec 0.3% 82.9% 0.0% 24.9% 1.07e+04 1 ArborX::BVH::BVH::generate_hierarchy [region]
|   |   |-> 2.21e-04 sec 0.2% 100.0% 0.0% ------ 1 ArborX::TreeConstruction::generate_hierarchy [for]
|   |-> 2.44e-04 sec 0.2% 13.2% 0.0% 86.8% 4.11e+03 1 ArborX::BVH::BVH::compute_linear_ordering [region]

meaning 2.13e-02s vs. 1.01e-01s.

Comment on lines 77 to 79
template <typename DeviceType>
struct AccessTraits<BoxesIntersectedByRayOrdered<DeviceType>,
ArborX::PredicatesTag>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid defining these away from their being used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KOKKOS_FUNCTION void operator()(Predicate const &,
int const primitive_index) const
{
_ordered_intersections(_i++) = primitive_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incrementing does not need atomicity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one predicate and every predicate is processed by a single thread so it's not really necessary to use atomics here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bug begging to happen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's still a variable local to the predicate even if there were more but sure I'll make it atomic, see 03e716f

Comment on lines 182 to 185
Kokkos::View<ArborX::Box *, DeviceType> device_boxes("boxes", 10);
Kokkos::deep_copy(exec_space, device_boxes,
Kokkos::View<ArborX::Box *, Kokkos::HostSpace>(
boxes.data(), boxes.size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template <typename DeviceType, typename T>
auto toView(std::vector<T> const &v, std::string const &lbl = "Test::Unnamed")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 201 to 219
auto const host_ordered_intersections = Kokkos::create_mirror_view_and_copy(
Kokkos::HostSpace{}, device_ordered_intersections);
for (unsigned int i = 0; i < n; ++i)
BOOST_TEST(host_ordered_intersections(i) == i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take advantage of our Kokkos::View adapters and the Boost native support to compare collection of values (grep for per_element)
We try to keep boiler plate code minimal to focus on the test content.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't because it's more concise this way IMHO but sure I'll change, see 03e716f

Kokkos::View<int *> offsets("offsets", 0);
bvh.query(exec_space, RaysSecond<MemorySpace>{rays},
AccumRaySphereOptDist<MemorySpace>{boxes}, values, offsets);
ArborX::Details::sortObjects(exec_space, values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a better comparison, one should create a vector of long long or double with each entry encoding a combination of key and rid. Then use it to produce a permutation. This would run about 2.5x faster than the sort on these structs with 4 values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a struct with only the required variables for sorting, shaves off like 0.5s. For a total of 1.13s for the original approach (vs 0.42s for the new approach).

old new
sorting 1.08e+00s 4.45e-01s
deposit_energy 8.62e-02s 1.55e-01s
total 1.66se+00s 1.07e+00s

@masterleinad masterleinad changed the title New Raytracing example using OrderedNearest TreeTraversal Introduce OrderedIntersection TreeTraversal Jun 20, 2022
Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

{
template <typename Geometry>
KOKKOS_INLINE_FUNCTION OrderedIntersection<Geometry>
ordered_nearest(Geometry const &geometry)
Copy link
Contributor

@dalg24 dalg24 Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to say "intersect"
I am tempted to define a tag Experimental::Ordered and define

intersects(Experimental::Ordered, Geometry const&)

@@ -22,6 +22,11 @@ struct NearestPredicateTag
struct SpatialPredicateTag
{};
} // namespace Details
namespace Experimental
{
struct OrderedIntersectionPredicateTag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct OrderedIntersectionPredicateTag
struct OrderedSpatialPredicateTag

std::is_same<PredicateTag, NearestPredicateTag>{},
std::is_same<PredicateTag, NearestPredicateTag>{} ||
std::is_same<PredicateTag,
Experimental::OrderedIntersectionPredicateTag>{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense you are using invoke_callback_and_check_early_exit in the traversal.

src/details/ArborX_Predicates.hpp Show resolved Hide resolved
Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the changes to <ArborX_Ray.hpp> can be dropped from this PR

src/details/ArborX_AccessTraits.hpp Outdated Show resolved Hide resolved
src/details/ArborX_Callbacks.hpp Outdated Show resolved Hide resolved
examples/raytracing/example_raytracing.cpp Outdated Show resolved Hide resolved
Comment on lines 192 to 198
ArborX::Experimental::Ray ray{
ArborX::Point{0, 0, 0},
ArborX::Experimental::Vector{1. / n, 1. / n, 1. / n}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please expand this test by shooting a second ray from (n, n, n) with direction (-1/n, -1/n, -1/n)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +190 to +192
!(std::is_same<PredicateTag, SpatialPredicateTag>{} ||
std::is_same<PredicateTag,
Experimental::OrderedSpatialPredicateTag>{}) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

Suggested change
!(std::is_same<PredicateTag, SpatialPredicateTag>{} ||
std::is_same<PredicateTag,
Experimental::OrderedSpatialPredicateTag>{}) ||
(std::is_same<PredicateTag, SpatialPredicateTag>{} ||
std::is_same<PredicateTag,
Experimental::OrderedSpatialPredicateTag>{}) &&

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says as much as if the predicate is spatial or ordered_spatial, the return type must be void or CallbackTreeTraversalControl.
The next one says: if the predicate is nearest, the return type must be void.

I think that I didn't change what we check (except for adding the new predicate, of course) but made both static_asserts more self-contained (in that we don't mention other predicates than the ones we want to impose a condition on).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right. It was a bit hard to read for me. This is an optional suggestion then.

@aprokop
Copy link
Contributor

aprokop commented Jul 18, 2022

@dalg24 Do you have any outstanding concerns about this PR?

@dalg24
Copy link
Contributor

dalg24 commented Jul 18, 2022

@dalg24 Do you have any outstanding concerns about this PR?

No

@aprokop
Copy link
Contributor

aprokop commented Jul 18, 2022

@masterleinad Could you please clean up the history before merge?

@masterleinad
Copy link
Collaborator Author

@masterleinad Could you please clean up the history before merge?

Here you go!

@masterleinad
Copy link
Collaborator Author

Retest this please.

@aprokop aprokop merged commit f2585f0 into arborx:master Jul 18, 2022
@aprokop aprokop deleted the raytracing branch July 18, 2022 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants