-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang cuda functions not handling concepts correctly. #67507
Comments
That's an interesting problem.
The key here is If you were to make the function |
I was able to find some workarounds, but the behavior definitely doesn't seem like it's correct. I suspect concept satisfiability is cached, so if I call the a host functions with the same signature; then call the GPU function, it works as expected. I don't think concept satisfiability should dependent on whether the functions have device or host markups. You may never invoke associated functions in device code; but they should still be used to check the concept requirements. |
Also, nvcc compiles the example fine. |
Reduced version:
|
As I suggested above, I believe clang is correct to error out here, because the kernel, which must satisfy the requirement is not allowed to call a host-only The main issue is that it's not clear how concepts are supposed to interact with target-based function overloading. This is largely unexplored territory. Collecting the corner cases like these and the workarounds would be useful as the starting point, so we can assess what the issues are and what would be the way to deal with them. |
This cannot be correct. If I call
Remove 1 and you get the compilation error. Even if your argument is that clang is correct to error (which I very much don't think it is). Then it should error consistently.
Concept satisfiability should be completely separate from whether a function is host, device, etc. This is what nvcc appears to do. And a concept can have requirements that aren't invoked from device code. |
As I said -- it's unexplored territory. I do agree that compiler should behave consistently. Clang not complaining in the second case looks like a bug to me, not the reason why the original error is wrong.
The 'requires' will be satisfied for the host function, but not for the GPU kernel. Granted, my familiarity with concepts is rather rudimentary and it's likely that I'm misunderstanding something. If I'm wrong here, please tell me why you would expect the example above, as written, to compile on the GPU.
NVCC not reporting an error may be a bug, too. I do not see how that concept can be satisfied for the GPU kernel, when |
Concepts can be high level and bundle a lot of behavior. You might have a generic concept such as geometric_shape, for example, that builds on other lower level concept primitives and would require a type to implement hundreds of different function requirements. Perhaps only a few would be relevant to GPU code. Nonetheless you may want to express generic APIs in terms of geometric_shape as having a collection of known behavior could be simpler and more maintainable than going through individual generic functions and determining minimal requirements. |
That's all good. But how do we reconcile it with the fact that with heterogeneous computing in the picture, some of the assumptions concepts may be implicitly relying on are no longer true? As I said, it's an unexplored territory. You expectation appears to be that the concepts written for the host-only code should magically work for GPU code, too, and I don't think it's always doable, at least for the concepts that depend on call-ability of some functions. It would work for a subset of such functions, the ones that are callable on both host and device -- that would be mostly |
No, I would expect an error if they get invoked from device code, but no error otherwise. For example, geometric_shape might require a function compute_area. You might use that function to compute the areas for a large array of objects on the GPU. But it may also require a function draw_object that interacts with display drivers and makes no sense to invoke from device code. I would expect geometric_shape to be satisfied if the functions are implemented. I would expect a compilation error if you try to invoke compute_area from device code and it doesn't have |
Can you give me a specific example? Here's a compiler explorer page with both host and device side CUDA compilation: https://cuda.godbolt.org/z/4rjda6r6T that you can use as a starting point. Please fill in an example of the scenario you've outlined above with the expectations you would have for the code. |
Here's a sketch of what should work
|
Surprisingly enough it's
What are the expectations enforced by the concept?
What would a concept look like for the following conditions:
|
Yes. That's what it's expressing. For an example, you could also write as
I'm not expecting host/device to factor into whether a concept is satisfied. So, if you call
I'd expect to not get a substitution failure, but still get an error that's equivalent to calling the function with no concept constraint.
Perhaps it might be useful to consider extending concept syntax to be able to express host-device specific requirements; but I image the most basic version would ignore |
If we allow host function to fulfill requirement of a device function, we may end up with a fake satisfaction of a real requirement, which defeats the purpose of concept. A better solution might be using different concepts for host and device templates. The concept for host only contains requirements on the host side. The concept for device only contains requirements on the device side. |
However, I think we may need to check concept requirements in host context by default, even if the check is done for device template. This is to be compatible with C++ constexpr templates and also nvcc. The rationale is that usually, users usually have different requirements on the host and device side. The requirements in a concept for C++ are for the host side by default. It may not be the users' intention to transfer them to the device side by function names. By doing this, we keep the compatibility with C++ and nvcc, but lose the capability of expressing requirements on device side. I think that may be acceptable as a baseline. Then we can consider introducing some new extension for expressing requirements on device side, e.g.
we can use conditional macros #if __has_extension to enable these device side requirements for clang only. |
Currently, constraints are checked in Sema::FinishTemplateArgumentDeduction, where the current function in ASTContext is set to the instantiated template function. When resolving functions for the constraints, clang assumes the caller is the current function, that's why host functions failed candidacy in template kernels. clang caches the constraint checking result per concept/type matching: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaConcept.cpp#L502 . It assumes the result does not depend on the instantiation context. That's why after a successful constraints-checking with a host function instantiation, it does not do the checking for the kernel instantiation. I think the fix should be to let constraint checking have its own host/device context and by default it is host to be compatible with C++. This will make the constraint checking independent of callers and make the caching valid. Later on, we may introduce device constraints by other means, e.g. adding |
Currently, constraints are checked in Sema::FinishTemplateArgumentDeduction, where the current function in ASTContext is set to the instantiated template function. When resolving functions for the constraints, clang assumes the caller is the current function, This causes incompatibility with nvcc and also for constexpr template functions with C++. clang caches the constraint checking result per concept/type matching. It assumes the result does not depend on the instantiation context. This patch let constraint checking have its own host/device context and by default it is host to be compatible with C++. This makes the constraint checking independent of callers and make the caching valid. In the future, we may introduce device constraints by other means, e.g. adding __device__ attribute per function call in constraints. Fixes: llvm#67507
@llvm/issue-subscribers-c-20
Take the example code posted below.
If I compile, I get this error
But the concept element should be satified. In fact, if I uncomment the line
the compilation will succeed. Here is the version of clang I'm using
code:
|
@llvm/issue-subscribers-clang-frontend
Take the example code posted below.
If I compile, I get this error
But the concept element should be satified. In fact, if I uncomment the line
the compilation will succeed. Here is the version of clang I'm using
code:
|
Take the example code posted below.
If I compile, I get this error
But the concept element should be satified. In fact, if I uncomment the line
the compilation will succeed.
Here is the version of clang I'm using
code:
The text was updated successfully, but these errors were encountered: