Skip to content

Conversation

@rchen20
Copy link
Member

@rchen20 rchen20 commented Jan 28, 2022

Summary

  • This PR is a bugfix.
  • It does the following:

#if defined(RAJA_TEST_EXHAUSTIVE)
, RAJA::omp_parallel_for_static_exec< >
, RAJA::omp_parallel_for_static_exec<4>
, RAJA::omp_parallel_for_nowait_static_exec< >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these exec policies removed from the list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, they no longer exist as valid policies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, these tests are already covered in lines 58-59, and in the forall/region tests. We can safely remove these two lines without further additions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot these were for atomic policies, so added them back in. Tested with nvcc+gcc.

reduce_minloc.reset(resetVal[0]);
reduce_maxloc.reset(resetVal[0]);
reduce_minloctup.reset(resetVal[0]);
reduce_maxloctup.reset(resetVal[0]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trws Apparently, the maxloctup.reset() was getting optimized out in the Sequential version of this test when compiled with NVCC + GCC. Changing the ordering fixes this for some reason, but doing so feels clunky. If there is some better way to get around this, please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it get optimized away with GCC only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, with GCC only. Not getting this error with other compilers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "optimized out" do you mean the side-effect didn't take effect so it was failing in the assertion below? That's the kind of thing that really shouldn't happen unless we accidentally have UB somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the check on line 157 fails. It returns the initialized values, or more specifically, the default initialized values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a definite cause for concern. It probably shouldn't matter for sequential anyway, but is it valid to use reset after compute but before get like this? I know we have places where get actually completes the reduction, and can't remember if all of that is wired up right to allow this order.

Copy link
Member

@MrBurmark MrBurmark Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the cuda and hip backends this should be safe, as reset also finalizes the reduction by calling get before resetting the value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems odd to me. what is a use case for calling reset before getting the reduced value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the author of the test not knowing about calling get before reset, there is probably no reason for this to happen. 😃

I'll modify the PR to put a get call into reset().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd, inserting get into the reset call does not help. Nor does manually calling get in the test itself. I'll still push the changes to reset, but keep the test with this modified ordering of calls.

endif()
endif (EXTERNAL_CAMP_SOURCE_DIR)

if (RAJA_ENABLE_TARGET_OPENMP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbeckingsale are you OK with these changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, seems reasonable - matches what we do in Umpire to ensure correct defines are passed to camp.

rhornung67
rhornung67 previously approved these changes Feb 9, 2022
@trws
Copy link
Member

trws commented Feb 10, 2022 via email

@rchen20
Copy link
Member Author

rchen20 commented Feb 10, 2022

This is one you should file an issue for so we have a record of it. Something is fishy here. Maybe something we should point ubsan toward and see what happens.

Good idea, created one here #1217.

@rchen20
Copy link
Member Author

rchen20 commented Feb 10, 2022

@davidbeckingsale @rhornung67 This should be good to go. I added get to the reset call for good measure (although we still need to change the ordering of the reset calls to have the test pass). Cleaned up warnings from the exhaustive tests, and fixed an old test. Could you take another look and perhaps reapprove?

@rchen20 rchen20 merged commit 673b233 into develop Feb 15, 2022
@rhornung67 rhornung67 deleted the bugfix/chen59/atomicexhaustive branch November 9, 2022 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants