-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[libclc] Optimize generic CLC fmin/fmax #128506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Note @arsenm I didn't touch amdgcn's fmin/fmax or r600's as I wasn't sure if any of that could be updated or at least unified. Would you be able to help there? I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec. |
These should use the regular builtin fmin / fmax.
The spec is quite badly written on what's expected of snans here, and the conformance test doesn't test what is written in the spec (hoping to fix that here |
Do you mean the AMD implementations, or the CLC ones too? Note there's no vector support for
Thanks for the link. I was going by 7.2 but now I see there's also a footnote. |
I don't suppose the recent clarifications to |
ping, thanks |
43d4d7d
to
572780f
Compare
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to #128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
The CLC fmin/fmax builtins now use clang's __builtin_elementwise_(min|max) which helps us generate llvm.(min|max)num intrinsics directly. These intrinsics select the non-NAN input over the NAN input, which adheres to the OpenCL specification. Note that the OpenCL specification doesn't require support for sNAN, so returning qNAN over sNAN is acceptable. Note also that the intrinsics don't differentiate between -0.0 and +0.0; this does not appear to be required - going by the OpenCL CTS, at least. These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets.
572780f
to
5c367b8
Compare
It depends on whether the conformance test is fixed to match the fuzzy language of the spec or not. If the decision is fmin/fmax should match the IEEE behavior, the implementation directly maps to llvm.minnum/llvm.maxnum. If the decision is the conformance test continues doing what it has been doing, it should directly map to llvm.minimumnum/maximumnum. In either case, we should not have code using canonicalizes |
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.
The CLC fmin/fmax builtins use clang's _builtin_elementwise(min|max) which helps us generate llvm.(min|max)num intrinsics directly. These intrinsics select the non-NAN input over the NAN input, which adheres to the OpenCL specification. Note that the OpenCL specification doesn't require support for sNAN, so returning qNAN over sNAN is acceptable. Note also that the intrinsics don't differentiate between -0.0 and +0.0; this does not appear to be required - going by the OpenCL CTS, at least.
These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets.