[SYCL] Move Bfloat16 math functions out of experimental #11506

JackAKirk · 2023-10-11T12:39:44Z

Bfloat16 math functions are generally supported via software implementation as described in the bfloat16 extension document.
These APIs are straightforward and should be future proof, as such it makes sense to mark them experimental ASAP.
This PR moves bfloat16 math functions back into the bfloat16 extension, and resultantly out of experimental namespace.
This makes it more straightforward to allow dpct to access bfloat16 optimizations available on intel and Nvidia hardware: oneapi-src/SYCLomatic#1341

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

gmlueck · 2023-10-13T13:20:21Z

sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc

+`isnan`, `ceil`, `floor`, `cos`, `sin`, `exp`, `exp2`, `exp10`, `log`, `log2`,
+`log10`, `rint`, `sqrt`, `rsqrt` and `trunc` SYCL floating point math functions.
+These functions can be used as element wise operations on matrices, supplementing
+the `bfloat16` support in the sycl_ext_oneapi_matrix extension.


This last sentence seems confusing because this extension does not add support to these math functions for joint_matrix types.

Sure, I can remove that sentence.

gmlueck · 2023-10-13T13:27:05Z

sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc

@@ -83,9 +83,21 @@ tolerate lower precision. Some implementations may still perform operations
 on this type using 32-bit math. For example, they may convert the `bfloat16`
 value to `float`, and then perform the operation on the 32-bit `float`.

+This extension also adds `bfloat16` support to the `fma`, `fmin`, `fmax`, `fabs`,
+`isnan`, `ceil`, `floor`, `cos`, `sin`, `exp`, `exp2`, `exp10`, `log`, `log2`,
+`log10`, `rint`, `sqrt`, `rsqrt` and `trunc` SYCL floating point math functions.


I think we do not need to list each math function here. Instead, let's say something like:

This extension also adds bfloat16 support to several of the SYCL math functions, which are listed below in the section "Math Functions".

Duplicating the names here raises the possibility that the list may get out of date in the future.

Sure sounds good.

gmlueck · 2023-10-13T13:36:21Z

sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc

-The bfloat16 type is supported on all devices. DPC++ currently supports this
-type natively on Intel Xe HP GPUs and Nvidia GPUs with
+The bfloat16 type and math functions are supported on all devices. DPC++
+currently supports this type natively on Intel Xe HP GPUs and Nvidia GPUs with
 Compute Capability >= SM80. On other devices, and in host code, it is emulated
 in software.



Yikes! How is it that this extension does not have a feature-test macro? There should be a section "Feature test macro" as defined in the template.

Does the implementation define a feature-test macro, and we just forgot to document it?

I'm not sure, but I can add one.

gmlueck · 2023-10-13T14:26:59Z

sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc

+
+template <size_t N>
+sycl::marray<bool, N> isnan(sycl::marray<bfloat16, N> x);
+} // namespace sycl::ext::oneapi


This synopsis does not match the form we use for the existing isnan overloads. That existing form looks like:

template<typename NonScalar> /*return-type*/ isnan(NonScalar x)

With the following specification:

Constraints: Available only if all of the following conditions are met:

NonScalar is marray, vec, or the __swizzled_vec__ type; and

The element type is float, double, or half.

Returns: If NonScalar is marray, returns true for each element of x only if x[i] has a NaN value. If NonScalar is vec or the __swizzled_vec__ type, returns -1 for each element of x if x[i] has a NaN value and
returns 0 otherwise.

I'd like this extension to use the same form, essentially broadening the constraint to allow NonScalar to be marray with element type bfloat16.

I also think we should extend the existing functions in the sycl namespace, rather than adding new functions in the sycl::ext::oneapi namespace. This is still consistent with the extension naming guidelines because the bfloat16 type is defined in the extension namespace. Thus, there is no danger of conflict with another extension and users still know they are using an extension when calling these functions with bfloat16.

A complicating factor is that DPC++ is in a state of transition from the old form of these math function to the new SYCL 2020 conformant form. Currently, DPC++ only defines the functions in the conformant form if the user opts-in by predefining a macro. I think the most reasonable path for the bfloat16 versions of these functions is to follow suit. By default, they will have the old non-conformant form, and they will only have the conformant form when the user pre-defines the macro. @steffenlarsen might have some input here, though.

I'm not sure how to handle this in the spec. One option is to add a general disclaimer NOTE saying that DPC++ currently implements the math functions as defined in this extension spec when the application predefines the macro. When the macro is not defined, the math functions accept bfloat16 parameters in a manner that is consistent with the standard types. We can then show an example of a typical function to illustrate the difference.

One other note of interest is that the published SYCL 2020 spec (revision 7) does not document the new form of these functions. The WG changed the specification after revision 7 was published. I plan to publish revision 8 within the next week, and that will show the new forms.

I also have a question about vec support. The math functions currently support both marray and vec whenever a non-scalar is supported. This extension currently only supports bfloat16 with marray. Why are we excluding support for vec? I wonder if it would actually be easier to implement if we supported both marray and vec since that's how the other types are handled.

This comment applies, obviously, to all the other functions below too.

All makes sense to me. For the vec question. I don't think it was excluded. I didn't add it initially when creating the first iteration of the bfloat16 math function extension because we did not have a reason to add vec when we had marray support, at least in the cuda backend; but I imagined that at some point other backends would likely motivate the addition of vec bfloat16 math functions.
Even if they are only added in order to have math functions for scalar, vec, and marray types that are consistent with the rest of the types, I see that this could make sense.
I can make such a change to the spec, and add a generic implementation for bfloat16 math functions for vec type in this PR?
Or perhaps I could delay this till later if you think that is better?
I suppose features can still be added to supported extensions so long as they are new additions rather than changes to existing functions/types etc?

After reflection, I think it would make sense to wait until the revision 8 is published, and then use the new math function definition patterns for the bfloat16 math functions too.

Since bfloat16 math functions are new, would it be better to just support them using the new math function definition patterns (forms)? As far as I understand it the move to the new form doesn't actually change how a user would define the function in practice, just that the new forms use a mixture of overloading and templating instead of the previous situation that just used templates?

If the above is the case it would seem overly complex to introduce multiple macros for bfloat16 math functions, especially if they form part of the extension for the bfloat16 class.

After reflection, I think it would make sense to wait until the revision 8 is published, and then use the new math function definition patterns for the bfloat16 math functions too.

I expect revision 8 to be published soon. Yes, I think it would be good for this extension specification to use the same pattern and specification style that revision 8 has.

Since bfloat16 math functions are new, would it be better to just support them using the new math function definition patterns (forms)? As far as I understand it the move to the new form doesn't actually change how a user would define the function in practice, just that the new forms use a mixture of overloading and templating instead of the previous situation that just used templates?

Most application code won't be impacted by the move from old to new form, but there are some edge cases that will be visible to applications. I think I may have been premature about the macro. Our plan is to have a macro in the headers, which selects between the old and new forms of the builtin functions, but I think it is not implemented yet.

I agree that in theory the support for bfloat16 could be added with only the new form (not selected by the macro) because we have no legacy code to support with the old form. I was just thinking that it might be hard to implement this way. I assume the current implementation of the math functions has a set of constraints for the NonScalar template parameter. Therefore, I was thinking that you would simply broaden these constraints to allow NonScalar to be marray<bfloat16>. Once we add the macro, wouldn't this mean that the bfloat16 support was also affected by the macro?

I assume the current implementation of the math functions has a set of constraints for the NonScalar template parameter.
Therefore, I was thinking that you would simply broaden these constraints to allow NonScalar to be marray<bfloat16>. > Once we add the macro, wouldn't this mean that the bfloat16 support was also affected by the macro?

At the moment the bfloat16 math functions support marray<bfloat16, N> where {N} means any positive value of size_t type.
So as I understand it this wouldn't affect bfloat16 support.

However as you pointed out before there is no sycl::vec support for bfloat16. I can only guess at whether Intel etc will want to introduce this or not. In relation to this question:

I note that SYCLomatic is currently converting Nvidia's natively supported vec2 bfloat16 type to a sycl::marray of size 2: https://github.com/oneapi-src/SYCLomatic/pull/800/files#diff-201c4634f29044b5cca5bcd4a23eddf27ea55f77447c9ecf30e4c38f166a0567R422

I know that at least intel cpus support bfloat16, but it isn't clear to me whether they would prefer sycl::vec types for bfloat16?

Although the vec question would as far as I can tell potentially just be an addition that I suppose could be added later; it is one of a few questions that seem worthwhile trying to address sooner rather than later.
Another issue that is not clear to me is where the bfloat16 functions in the intel::math namespace fit in relation to this bfloat16 extension. To be more concrete here are a few questions I have:

Will conversion functions that are translated from corresponding sycl bfloat16 library functions in syclomatic ([SYCLomatic] Support migration of 67 bf16 precision conversion APIs. oneapi-src/SYCLomatic#1351) become part of the bfloat16 extension? Note that currently there are no implementations for these functions in the cuda backend.

Will e.g. bfloat16 arithmetic functions that not match sycl::math functions, but are presumably important for the target application for which bfloat16 was created (deep learning) be added? I note that there are intel::math functions for half matching some of these cuda functions 1:1 ( such as __hfma_relu, see https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/math/imf_half_trivial.hpp). There are also syclomatic mappings for these functions: oneapi-src/SYCLomatic@6b8429a.

Is the long term plan that bfloat16 type and associated functions be a core part of sycl, or available as a header only library?

Although the existing bfloat16 math functions that are detailed in this PR appear to be uncontroversial, maybe it makes sense to get clearer answers to the above before moving any bfloat16 math functions out of experimental.

I can only answer this question:

Is the long term plan that bfloat16 type and associated functions be a core part of sycl, or available as a header only library?

Yes, I think we should consider eventually adding bfloat16 as a core part of SYCL. We don't want it to be a header-only library because it needs some underlying h/w support for efficiency.

We need someone to be the "owner" of the bfloat16 feature, who can decide what functions make sense to add to this extension.

KseniyaTikhomirova

LGTM from RT source code side. Will approve after Greg's comments fix.

Removed overly verbose list of math functions supported in intro. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

github-actions · 2024-10-08T01:59:55Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

JackAKirk added 3 commits October 10, 2023 09:33

Moved bfloat16 math back into main bfloat16 extension doc.

9fc449a

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Move bfloat16_math impl from experimental namespace.

db328d3

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

fix format.

304f54e

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk temporarily deployed to WindowsCILock October 11, 2023 13:56 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock October 11, 2023 14:22 — with GitHub Actions Inactive

JackAKirk closed this Oct 12, 2023

JackAKirk reopened this Oct 12, 2023

JackAKirk temporarily deployed to WindowsCILock October 12, 2023 11:33 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock October 12, 2023 12:00 — with GitHub Actions Inactive

JackAKirk marked this pull request as ready for review October 12, 2023 19:56

JackAKirk requested review from a team as code owners October 12, 2023 19:56

JackAKirk requested a review from KseniyaTikhomirova October 12, 2023 19:56

JackAKirk mentioned this pull request Oct 13, 2023

Use sycl::bfloat16 class and functions instead of float casts. oneapi-src/SYCLomatic#1341

Open

JackAKirk requested a review from gmlueck October 13, 2023 13:58

gmlueck reviewed Oct 13, 2023

View reviewed changes

KseniyaTikhomirova reviewed Oct 16, 2023

View reviewed changes

Added feature test macro.

0aa4eac

Removed overly verbose list of math functions supported in intro. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk marked this pull request as draft April 10, 2024 18:18

github-actions bot added the Stale label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Move Bfloat16 math functions out of experimental #11506

[SYCL] Move Bfloat16 math functions out of experimental #11506

JackAKirk commented Oct 11, 2023

gmlueck Oct 13, 2023

JackAKirk Oct 13, 2023

gmlueck Oct 13, 2023

JackAKirk Oct 13, 2023

gmlueck Oct 13, 2023

JackAKirk Oct 13, 2023

gmlueck Oct 13, 2023

JackAKirk Oct 13, 2023 •

edited

Loading

JackAKirk Oct 18, 2023 •

edited

Loading

gmlueck Oct 18, 2023

JackAKirk Oct 19, 2023

gmlueck Oct 20, 2023

KseniyaTikhomirova left a comment

github-actions bot commented Oct 8, 2024

[SYCL] Move Bfloat16 math functions out of experimental #11506

Are you sure you want to change the base?

[SYCL] Move Bfloat16 math functions out of experimental #11506

Conversation

JackAKirk commented Oct 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

JackAKirk Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KseniyaTikhomirova left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 8, 2024

JackAKirk Oct 13, 2023 •

edited

Loading

JackAKirk Oct 18, 2023 •

edited

Loading