-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work on compatibility #735
Conversation
// We assume that atomicCAS for shorts has the same compute capability restrictions. | ||
#if __CUDA_ARCH__ < 700 | ||
|
||
// Inspired by https://github.com/torch/cutorch/blob/master/lib/THC/THCAtomics.cuh#L96-L119 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is doing atomicAdd with unsigned int
instead of unsigned short
? Should we be doing that as well? Then we could basically just copy the function from the link without having to redefine atomicCAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I chose to do create atomicCAS instead is since the implementations for atomicMin and atomicMax use it. We could also rewrite two to work like this instead if we want, but that would mean we put compativility code inside max_to.cu and min_to.cu or move those functions here.
The runtime errors you are getting are from the JIT compiled kernels using nvrtc. They don't have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to merge this for now, feel free to make changes and open another PR
atomicCAS
since it's quite unsafe and complicated. It's almost 3 AM so I worry I might have messed it up.__hmax_nan
and__hmin_nan
so that we can have the nice NaN propagation even on older GPUs700
.I still get runtime errors like this:
which are very strange.
To be honest I think this PR breaks more things than it fixes. 🥴
For reference, running
cargo test --features cuda,test-f16
I get:245 passed; 112 failed;