Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on compatibility #735

Merged
merged 4 commits into from
Apr 25, 2023
Merged

Conversation

ViliamVadocz
Copy link
Contributor

  • A few unrelated fixes to optimizers that I needed to compile and run the code.
  • Compatibility header with a few functions:
    • Pay special attention to atomicCAS since it's quite unsafe and complicated. It's almost 3 AM so I worry I might have messed it up.
    • I implemented __hmax_nan and __hmin_nan so that we can have the nice NaN propagation even on older GPUs
    • Someone should try where the compute capability boundaries for when functions are defined actually lie. For now I just guessed 700.
  • I added an example Windows include path so that you have some idea of what it looks like for me.

I still get runtime errors like this:

---- tensor_ops::reshape_to::tests::test_1d_reshape_non_contiguous stdout ----
thread 'tensor_ops::reshape_to::tests::test_1d_reshape_non_contiguous' panicked at 'called `Result::unwrap()` on an `Err` value: CompileError { nvrtc: NvrtcError(NVRTC_ERROR_COMPILATION), options: ["--include-path=/usr/include", "--include-path=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.0\\include", "--gpu-architecture=sm_61"], log: "default_program(46): error: no instance of overloaded function \"atomicAdd\" matches the argument list\n            argument types are: (__half *, const __half)\n\n1 error detected in the compilation of \"default_program\".\n" }', src\tensor_ops\reshape_to\cuda_kernel.rs:79:56

which are very strange.


To be honest I think this PR breaks more things than it fixes. 🥴
For reference, running cargo test --features cuda,test-f16 I get: 245 passed; 112 failed;

// We assume that atomicCAS for shorts has the same compute capability restrictions.
#if __CUDA_ARCH__ < 700

// Inspired by https://github.com/torch/cutorch/blob/master/lib/THC/THCAtomics.cuh#L96-L119
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is doing atomicAdd with unsigned int instead of unsigned short? Should we be doing that as well? Then we could basically just copy the function from the link without having to redefine atomicCAS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I chose to do create atomicCAS instead is since the implementations for atomicMin and atomicMax use it. We could also rewrite two to work like this instead if we want, but that would mean we put compativility code inside max_to.cu and min_to.cu or move those functions here.

@coreylowman
Copy link
Owner

The runtime errors you are getting are from the JIT compiled kernels using nvrtc. They don't have the tensor_ops/utilities as an include directory (since I'm not sure if that path even exists in compiled binaries), so you can't use things in cuda_utils.cuh or your new compatibility header.

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge this for now, feel free to make changes and open another PR

@coreylowman coreylowman merged commit 3fe083d into coreylowman:f16 Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants