Skip to content

[libclc] Optimize generic CLC fmin/fmax #128506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Feb 24, 2025

The CLC fmin/fmax builtins use clang's _builtin_elementwise(min|max) which helps us generate llvm.(min|max)num intrinsics directly. These intrinsics select the non-NAN input over the NAN input, which adheres to the OpenCL specification. Note that the OpenCL specification doesn't require support for sNAN, so returning qNAN over sNAN is acceptable. Note also that the intrinsics don't differentiate between -0.0 and +0.0; this does not appear to be required - going by the OpenCL CTS, at least.

These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets.

@frasercrmck frasercrmck added the libclc libclc OpenCL library label Feb 24, 2025
@frasercrmck frasercrmck requested a review from arsenm February 24, 2025 12:44
@frasercrmck
Copy link
Contributor Author

Note @arsenm I didn't touch amdgcn's fmin/fmax or r600's as I wasn't sure if any of that could be updated or at least unified. Would you be able to help there?

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

@arsenm
Copy link
Contributor

arsenm commented Feb 24, 2025

These should use the regular builtin fmin / fmax.

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

The spec is quite badly written on what's expected of snans here, and the conformance test doesn't test what is written in the spec (hoping to fix that here

@frasercrmck
Copy link
Contributor Author

These should use the regular builtin fmin / fmax.

Do you mean the AMD implementations, or the CLC ones too? Note there's no vector support for __builtin_fmin which is why I chose __builtin_elementwise_min. They appear to generate the same code so maybe I'm misunderstanding the difference between the two builtins.

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

The spec is quite badly written on what's expected of snans here, and the conformance test doesn't test what is written in the spec (hoping to fix that here

Thanks for the link. I was going by 7.2 but now I see there's also a footnote.

@frasercrmck
Copy link
Contributor Author

I don't suppose the recent clarifications to llvm.minnum and llvm.maxnum change anything here?

@frasercrmck
Copy link
Contributor Author

ping, thanks

@frasercrmck frasercrmck force-pushed the libclc-clc-fmin-fmax branch from 43d4d7d to 572780f Compare April 1, 2025 11:12
frasercrmck added a commit to frasercrmck/llvm-project that referenced this pull request Apr 3, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
frasercrmck added a commit that referenced this pull request Apr 29, 2025
This is an alternative to #128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
gizmondo pushed a commit to gizmondo/llvm-project that referenced this pull request Apr 29, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
The CLC fmin/fmax builtins now use clang's
__builtin_elementwise_(min|max) which helps us generate
llvm.(min|max)num intrinsics directly. These intrinsics select the
non-NAN input over the NAN input, which adheres to the OpenCL
specification. Note that the OpenCL specification doesn't require
support for sNAN, so returning qNAN over sNAN is acceptable. Note also
that the intrinsics don't differentiate between -0.0 and +0.0; this does
not appear to be required - going by the OpenCL CTS, at least.

These intrinsics maintain the vector types, as opposed to scalarizing,
which was previously happening. This commit therefore helps to optimize
codegen for those targets.
@frasercrmck frasercrmck force-pushed the libclc-clc-fmin-fmax branch from 572780f to 5c367b8 Compare April 29, 2025 10:18
@frasercrmck frasercrmck changed the title [libclc] Move fmin/fmax to the CLC library [libclc] Optimize generic CLC fmin/fmax Apr 29, 2025
@arsenm
Copy link
Contributor

arsenm commented Apr 29, 2025

I don't suppose the recent clarifications to llvm.minnum and llvm.maxnum change anything here?

It depends on whether the conformance test is fixed to match the fuzzy language of the spec or not. If the decision is fmin/fmax should match the IEEE behavior, the implementation directly maps to llvm.minnum/llvm.maxnum. If the decision is the conformance test continues doing what it has been doing, it should directly map to llvm.minimumnum/maximumnum. In either case, we should not have code using canonicalizes

IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
This is an alternative to llvm#128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants