-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int overflow in ReduceOps.eval(Box, ...) #3453
Comments
We have the same issue in ParalleFor. It wasn't really an issue before because GPUs used to not have so much memory. But now it has become an issue. Presumably switching to long will use more registers. |
In my case there isn’t an allocation of the size of the box, I need to count how many particles need to be initialized. If the additional number of registers is a problem, one could split the box inside .eval() so that every chunk has less than max int number of cells. However, in my testing the number of registers usually only has a small impact compared to optimizing for memory access pattern. For now, I can use
|
We could add some assertions first before we decide what to do. |
I looked at a few thigs in Compiler Explorer and it seems the main reason using a 64 bit icell would be slower and use more registers is the 64 bit integer division by lenxy and lenx. A100 does not have division assembly instructions so it needs to be emulated. For 64 bit this seems to be so bad that in the assembly it will check if it can use a 32 bit division instead (also emulated). Since the divisor is the same for all threads the CPU could help by precomputing some values to replace the division on the GPU with multiplication. I found this library that does that https://github.com/NVIDIA/cutlass/blob/2a9fa23e06b1fc0b7fab7a3e29ff1b17e325da7f/include/cutlass/fast_math.h#L404-L488 |
If the number of cells in a Box exceeds the maximum integer, eval wont work correctly and in my case would not loop over all elements without giving an error. A 64 bit index type should be used instead of int.
amrex/Src/Base/AMReX_Reduce.H
Line 504 in 34c0ae3
and
amrex/Src/Base/AMReX_Reduce.H
Line 564 in 34c0ae3
The text was updated successfully, but these errors were encountered: