Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Move bfloat support from experimental to supported. #6524

Merged
merged 99 commits into from
Nov 28, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
6014cef
[SYCL] Move bfloat support from experimental to supported.
rdeodhar Aug 3, 2022
bdd88e5
Corrections to tests.
rdeodhar Aug 3, 2022
73ed541
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Aug 24, 2022
0fe1884
Moved another file out of experimental space.
rdeodhar Aug 24, 2022
feb9d5f
Responses to review comments.
rdeodhar Aug 25, 2022
129f53f
Removed unneeded sycl::half conversion and updated doc.
rdeodhar Aug 26, 2022
2115f09
Added conversion from sycl::half to bfloat16.
rdeodhar Aug 29, 2022
3c2eb80
Cleanup of documentation.
rdeodhar Aug 31, 2022
74aa175
Hooked up bfloat16 aspect within OpenCL plugin.
rdeodhar Sep 2, 2022
bd05711
Support for bfloat16 aspect, and native or fallback support.
rdeodhar Sep 8, 2022
f8e894c
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 8, 2022
2ad68f6
Formatting changes.
rdeodhar Sep 8, 2022
4b78c03
Formatting changes.
rdeodhar Sep 8, 2022
0fce16d
Update to documentation.
rdeodhar Sep 8, 2022
4bcb383
Deprecate bfloat16 aspect.
rdeodhar Sep 8, 2022
35308f8
Fixes for ESIMD.
rdeodhar Sep 9, 2022
fa045e2
Reinstated to_float and from_float, used by NVidia, updated doc.
rdeodhar Sep 9, 2022
3322d6a
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 12, 2022
b12fd94
Update to doc.
rdeodhar Sep 12, 2022
87b0f09
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 14, 2022
f217eb4
Corrections to headers.
rdeodhar Sep 14, 2022
a908b11
Formatting change.
rdeodhar Sep 14, 2022
aab4c78
bfloat16 class supports all sm_xx devices.
Sep 15, 2022
a2568ba
Merge pull request #1 from JackAKirk/bfloat16-cuda-allarch
rdeodhar Sep 15, 2022
4d7a22b
Changes to keep bfloat math functions experimental for now.
rdeodhar Sep 16, 2022
38e5ad4
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 16, 2022
b9accad
Cleanup of bfloat16_math extension.
rdeodhar Sep 16, 2022
ca7880a
Document updates and minor changes.
rdeodhar Sep 19, 2022
dc3b2b5
Fixes for long lines in doc, a different way to check for NaN.
rdeodhar Sep 19, 2022
c955d36
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 20, 2022
1aa6ad3
Broke long lines into multiple lines.
rdeodhar Sep 20, 2022
ff04ce1
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 21, 2022
802f502
Changed library order on Windows.
rdeodhar Sep 21, 2022
8d7f46a
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 22, 2022
190f2a3
Fix for AOT compilation and correction to new headers.
rdeodhar Sep 22, 2022
84c50f3
Noted AOT limitation in doc.
rdeodhar Sep 23, 2022
df058ba
Adjustment for AOT compilation.
rdeodhar Sep 24, 2022
fed4d1d
Fixes for AOT builds.
rdeodhar Sep 26, 2022
28259d0
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 26, 2022
c11115b
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 26, 2022
6b05a2a
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 27, 2022
a82d73a
Fixes for AOT multiple devices.
rdeodhar Sep 27, 2022
3fc8885
Updated documentation.
rdeodhar Sep 27, 2022
1ec6838
Added back missing Status section in documentation.
rdeodhar Sep 27, 2022
105094b
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 27, 2022
432e775
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Sep 29, 2022
c135643
Added tests, corrected aspect check.
rdeodhar Oct 1, 2022
4eca414
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 1, 2022
8876ac8
Added missing newlines.
rdeodhar Oct 3, 2022
f0f2727
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 3, 2022
17673bf
Corrections to tests and macros, added host code emulation.
rdeodhar Oct 4, 2022
1094b8c
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 4, 2022
8d40228
Small corrections.
rdeodhar Oct 4, 2022
c5a85cf
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 4, 2022
cf8f6e0
Fixes for AOT.
rdeodhar Oct 4, 2022
5e50646
Formatting change.
rdeodhar Oct 4, 2022
45d3e70
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 5, 2022
a7be718
Renamed the bfloat aspects.
rdeodhar Oct 5, 2022
cac1c18
Fixes for generic JIT compilation.
rdeodhar Oct 6, 2022
208c09a
Changes for AOT sycl-targets switch.
rdeodhar Oct 6, 2022
46f406d
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 6, 2022
6830857
Corrected aspects queries.
rdeodhar Oct 6, 2022
46e5278
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 6, 2022
10fc9a3
Change in the way fallback/native libs are selected.
rdeodhar Oct 8, 2022
6195545
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 8, 2022
437e34a
Changed type of string.
rdeodhar Oct 10, 2022
09dc4c5
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 12, 2022
386353e
Replaced bfloat16 aspect with bfloat16_math_functions aspect.
rdeodhar Oct 12, 2022
0f93586
Improved devices check in clang driver.
rdeodhar Oct 13, 2022
48f3cac
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 13, 2022
d33cb10
Enhanced test for improved bfloat16 target detection.
rdeodhar Oct 13, 2022
28992c2
Updated bfloat16 driver test for windows.
rdeodhar Oct 13, 2022
ec28c8b
Use STL for parsing devices.
rdeodhar Oct 13, 2022
b958fc7
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 24, 2022
ec70b20
Allow spir64 target to be JIT even when combined with AOT targets.
rdeodhar Oct 24, 2022
1b86012
Updated documentation.
rdeodhar Oct 24, 2022
3e1e681
Modifications for mixed JIT and AOT compilations, added tests.
rdeodhar Oct 25, 2022
8c633d3
Corrections to comments.
rdeodhar Oct 25, 2022
1a59e03
Update to documentation.
rdeodhar Oct 25, 2022
b2fd6cc
Updated doc.
rdeodhar Oct 25, 2022
fab2e54
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 26, 2022
35b8910
Adjustments to tests.
rdeodhar Oct 27, 2022
a05c872
Test cleanup.
rdeodhar Oct 27, 2022
ac5f603
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Oct 27, 2022
6d45ed1
Adjustments to more tests.
rdeodhar Oct 27, 2022
077d0fe
Change to tests to ensure AOT components are available.
rdeodhar Oct 28, 2022
2ff6a9d
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 7, 2022
d7c80ee
Adjustment to test for new bfloat16 header.
rdeodhar Nov 7, 2022
20d13df
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 8, 2022
cd1d0a2
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 15, 2022
4bf60b9
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 18, 2022
45c32f7
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 21, 2022
5de1bf7
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 22, 2022
6ec2bb9
Changes for indirect accesses.
rdeodhar Nov 22, 2022
49e9cd1
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 22, 2022
2065060
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 23, 2022
e24e57b
Fixed conflicts.
rdeodhar Nov 23, 2022
41098ab
Merge branch 'sycl' of https://github.com/intel/llvm into bfloat16
rdeodhar Nov 25, 2022
37b05f0
Correction to library list.
rdeodhar Nov 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,53 +22,43 @@

== Notice

IMPORTANT: This specification is a draft.
[%hardbreaks]
Copyright (C) 2022-2022 Intel Corporation. All rights reserved.

Copyright (c) 2021-2022 Intel Corporation. All rights reserved.
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

== Dependencies
== Contact

This extension is written against the SYCL 2020 specification, Revision 4.
To report problems with this extension, please open a new issue at:

== Status
https://github.com/intel/llvm/issues

Draft

This is a preview extension specification, intended to provide early access to
a feature for review and community feedback. When the feature matures, this
specification may be released as a formal extension.
== Dependencies

Because the interfaces defined by this specification are not final and are
subject to change they are not intended to be used by shipping software
products.
This extension is written against the SYCL 2020 specification, Revision 5.

== Version
== Status

Revision: 5
This extension is implemented and fully supported by DPC++.
[NOTE]
====
This extension is currently implemented in `dpcpp` only for GPU devices that support `bfloat16` natively. Attempting to use this extension in
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
kernels that run on other devices may result in undefined behavior.
Be aware that the compiler is not able to issue a diagnostic to warn you if this happens.
====

gmlueck marked this conversation as resolved.
Show resolved Hide resolved
== Introduction
== Overview

This extension adds functionality to convert value of single-precision
floating-point type(`float`) to `bfloat16` type and vice versa. The extension
doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer
type(`uint16_t`) as a storage for `bfloat16` values.
This extension adds support for a 16-bit floating point type `bfloat16`. This type occupies 16 bits of storage space as does the `sycl::half` type. However, `bfloat16` allots 8 bits to the exponent instead of the 5 bits used by `sycl::half` and 7 bits to the significand versus 10 bits used by `sycl::half`. Thus, `bfloat16` has the same dynamic range as a 32-bit `float` but with reduced precision. This type is useful when memory required to store the values must be reduced, and when the calculations require high dynamic range but can tolerate lower-precision. Some implementations may still perform operations on this type using 32-bit math. For example, they may convert the `bfloat16` value to `float`, and then perform the operation on the 32-bit `float`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the new text you added has very long lines. Please respect the 80-column limit as documented in the template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.


The purpose of conversion from float to bfloat16 is to reduce the amount of memory
required to store floating-point numbers. Computations are expected to be done with
32-bit floating-point values.

This extension is an optional kernel feature as described in
https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[section 5.7]
of the SYCL 2020 spec. Therefore, attempting to submit a kernel using this
feature to a device that does not support it should cause a synchronous
`errc::kernel_not_supported` exception to be thrown from the kernel invocation
command (e.g. from `parallel_for`).
== Specification

== Feature test macro
=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an implementation
Expand All @@ -84,7 +74,7 @@ the implementation supports this feature, or applications can test the macro’s
|1 |Initial extension version. Base features are supported.
|===

== Extension to `enum class aspect`
=== Extension to `enum class aspect`

[source]
----
Expand All @@ -99,49 +89,47 @@ enum class aspect {
If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively
supports conversion of values of `float` type to `bfloat16` and back.
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

If the device doesn't have the aspect, objects of `bfloat16` class must not be
used in the device code.
This extension is an optional kernel feature as described in section 5.7 of the SYCL 2020 spec, with the associated aspect `ext_oneapi_bfloat16`. Applications can query whether the device has this aspect to determine if it supports kernels that use `bfloat16`. Attempting to submit a kernel using `bfloat16` to a device that does not support it causes a synchronous `errc::kernel_not_supported` exception to be thrown from the kernel invocation command (e.g. from `parallel_for`).

**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The
`bfloat16` class is currently supported only on Xe HP GPU and Nvidia GPUs with Compute Capability >= SM80.
[NOTE]
====
. DPC++ does not currently implement the `errc::kernel_not_supported` exception in this case. Attempting to submit a kernel using `bfloat16` to a device that does not have the `ext_oneapi_bfloat16` aspect results in undefined behavior.
. The `bfloat16` class is currently supported only on Xe HP GPUs and Nvidia GPUs with Compute Capability >= SM80.
====

== New `bfloat16` class

The `bfloat16` class below provides the conversion functionality. Conversion
from `float` to `bfloat16` is done with round to nearest even(RTE) rounding
mode.
=== New `bfloat16` class

gmlueck marked this conversation as resolved.
Show resolved Hide resolved
The `bfloat16` type represents a 16-bit floating point value. Conversions from `float` to `bfloat16` are done with round to nearest even (RTE) rounding mode.

gmlueck marked this conversation as resolved.
Show resolved Hide resolved
[source]
----
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

class bfloat16 {
using storage_t = uint16_t;
storage_t value;

public:
bfloat16() = default;
bfloat16(const bfloat16 &) = default;
~bfloat16() = default;

// Explicit conversion functions
static storage_t from_float(const float &a);
static float to_float(const storage_t &a);

// Convert from float to bfloat16
bfloat16(const float &a);
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
bfloat16 &operator=(const float &a);

// Convert from bfloat16 to float
// Convert bfloat16 to float
operator float() const;

// Convert from sycl::half to bfloat16
bfloat16(const sycl::half &a);
bfloat16 &operator=(const sycl::half &a);

// Get bfloat16 as uint16.
operator storage_t() const;
// Convert bfloat16 to sycl::half
operator sycl::half() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this conversion to sycl::half. However, we should also add the opposite conversion from sycl::half to bfloat16:

bfloat16(const sycl::half &a);
bfloat16 &operator=(const sycl::half &a);

Do we also need conversion to / from double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is intended to move the current bfloat16 support out of experimental space. Any changes to the level of bfloat16 support can be done in future PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Intel platforms the bfloat16 to/from float is done using the __spirv_ConvertBF16ToFINTELoperator. I suspect a double version of that does not exist.
Float to double conversion can be made in the usual C++ way more efficiently in hardware. A direct version of bfloat16 to double conversion in software will involve more bit twiddling than the float conversion where only trailing 0 bits of fraction need to be inserted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sycl::half class includes conversions to/from float. Those kick in when bfloat16 is used with sycl::half, so conversions between bfloat16 and sycl::half are not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that we should remove this conversion from bfloat16 to sycl::half?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This item was revisited and it turns out that sycl::half <-> bfloat16 conversions are needed. They have been added.

Copy link
Contributor

@MrSidims MrSidims Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for joining the discussion late. May be it's a nitpick, but should we tell, that conversion half <-> bfloat16 follows IEEE 754 float <-> half conversion? In other words, what happens, if bfloat16 value overflows half range? Also are we adding last 3 fraction bits stochastically or they are guarantied to be zero (or it's implementation detail)?


// Convert to bool type
// Convert bfloat16 to bool type
explicit operator bool();

friend bfloat16 operator-(bfloat16 &bf) { /* ... */ }
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -170,7 +158,6 @@ public:
friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }
};

} // namespace experimental
} // namespace oneapi
} // namespace ext
} // namespace sycl
Expand All @@ -180,12 +167,6 @@ Table 1. Member functions of `bfloat16` class.
|===
| Member Function | Description

| `static storage_t from_float(const float &a);`
| Explicitly convert from `float` to `bfloat16`.

| `static float to_float(const storage_t &a);`
| Interpret `a` as `bfloat16` and explicitly convert it to `float`.

| `bfloat16(const float& a);`
| Construct `bfloat16` from `float`. Converts `float` to `bfloat16`.

Expand All @@ -195,11 +176,17 @@ Table 1. Member functions of `bfloat16` class.
| `operator float() const;`
| Return `bfloat16` value converted to `float`.

| `operator storage_t() const;`
| Return `uint16_t` value, whose bits represent `bfloat16` value.
| `bfloat16(const sycl::half& a);`
| Construct `bfloat16` from `sycl::half`. Converts `sycl::half` to `bfloat16`.

| `bfloat16 &operator=(const sycl::half &a);`
| Replace the value with `a` converted to `bfloat16`

| `operator sycl::half() const;`
| Return `bfloat16` value converted to `sycl::half`.

| `explicit operator bool() { /* ... */ }`
| Convert `bfloat16` to `bool` type. Return `false` if the value equals to
| Convert `bfloat16` to `bool` type. Return `false` if the `value` equals to
zero, return `true` otherwise.

| `friend bfloat16 operator-(bfloat16 &bf) { /* ... */ }`
Expand Down Expand Up @@ -253,85 +240,87 @@ Table 1. Member functions of `bfloat16` class.
| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` `bfloat16`
values and return the result as a boolean value.

OP is `==, !=, <, >, <=, >=`
OP is `+==, !=, <, >, <=, >=+`

| `template <typename T>
friend bool operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }`
| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` of
template type `T` and return the result as a boolean value. Type `T` must be
convertible to `float`.

OP is `==, !=, <, >, <=, >=`
OP is `+==, !=, <, >, <=, >=+`

| `template <typename T>
friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }`
| Perform comparison operation OP between `lhs` of template type `T` and `rhs`
`bfloat16` value and return the result as a boolean value. Type `T` must be
convertible to `float`.

OP is `==, !=, <, >, <=, >=`
OP is `+==, !=, <, >, <=, >=+`
|===

== Example

[source]
----
#include <sycl/sycl.hpp>
#include <sycl/ext/oneapi/experimental/bfloat16.hpp>

using sycl::ext::oneapi::experimental::bfloat16;

bfloat16 operator+(const bfloat16 &lhs, const bfloat16 &rhs) {
return static_cast<float>(lhs) + static_cast<float>(rhs);
}
using namespace sycl;
using sycl::ext::oneapi::bfloat16;

float foo(float a, float b) {
// Convert from float to bfloat16.
bfloat16 A {a};
bfloat16 B {b};
bfloat16 A{a};
bfloat16 B{b};

// Convert A and B from bfloat16 to float, do addition on floating-pointer
// Convert A and B from bfloat16 to float, do addition on floating-point
// numbers, then convert the result to bfloat16 and store it in C.
bfloat16 C = A + B;

// Return the result converted from bfloat16 to float.
return C;
}

int main (int argc, char *argv[]) {
int main(int argc, char *argv[]) {
float data[3] = {7.0, 8.1, 0.0};
sycl::device dev;
sycl::queue deviceQueue{dev};
sycl::buffer<float, 1> buf {data, sycl::range<1> {3}};

if (dev.has(sycl::aspect::ext_oneapi_bfloat16)) {
deviceQueue.submit ([&] (sycl::handler& cgh) {
auto numbers = buf.get_access<sycl::access::mode::read_write> (cgh);
cgh.single_task<class simple_kernel> ([=] () {
numbers[2] = foo(numbers[0], numbers[1]);
});
device dev{gpu_selector()};
queue deviceQueue{dev};
buffer<float, 1> buf{data, 3};

if (dev.has(aspect::ext_oneapi_bfloat16)) {
deviceQueue.submit([&](handler &cgh) {
accessor numbers{buf, cgh, read_write};
cgh.single_task([=]() { numbers[2] = foo(numbers[0], numbers[1]); });
});
} else {
std::cout << "No bfloat16 support\n";
return 1;
}
host_accessor hostOutAcc{buf, read_only};
std::cout << "Result = " << hostOutAcc[2] << std::endl;
return 0;
}
----

== New bfloat16 math functions
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension.
Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the `sycl_ext_oneapi_matrix` extension.

The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.

The following functions are only available when `T` is `bfloat16` or `sycl::marray<bfloat16, {N}>`, where `{N}` means any positive value of `size_t` type.


=== fma
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

```c++
namespace sycl::ext::oneapi::experimental {
namespace sycl::ext::oneapi {

template <typename T>
T fma(T a, T b, T c);
} // namespace sycl::ext::oneapi::experimental
bfloat16 fma(bfloat16 a, bfloat16 b, bfloat16 c);

template<size_t N>
marray<bfloat16, N> fma(marray<bfloat16, N> a, marray<bfloat16, N> b, marray<bfloat16, N> c);

} // namespace sycl::ext::oneapi
```

==== Description
Expand All @@ -342,10 +331,14 @@ Rounding of intermediate products shall not occur. The mantissa LSB rounds to th
=== fmax

```c++
namespace sycl::ext::oneapi::experimental {
template <typename T>
T fmax(T x, T y);
} // namespace sycl::ext::oneapi::experimental
namespace sycl::ext::oneapi {

bfloat16 fmax(bfloat16 x, bfloat16 y);

template<size_t N>
marray<bfloat16, N> fmax(marray<bfloat16, N> x, marray<bfloat16, N> y);

} // namespace sycl::ext::oneapi
```

==== Description
Expand All @@ -360,28 +353,34 @@ NaNs, `fmax()` returns a NaN.
=== fmin

```c++
namespace sycl::ext::oneapi::experimental {
template <typename T>
T fmin(T x, T y);
} // namespace sycl::ext::oneapi::experimental
namespace sycl::ext::oneapi {

bfloat16 fmin(bfloat16 a, bfloat16 b);

template<size_t N>
marray<bfloat16, N> fmin(marray<bfloat16, N> a, marray<bfloat16, N> b);

} // namespace sycl::ext::oneapi
```

==== Description

Returns `y` if
`y < x`, otherwise it
returns `x`. If one argument is a
NaN, `fmax()` returns the other
NaN, `fmin()` returns the other
argument. If both arguments are
NaNs, `fmax()` returns a NaN.
NaNs, `fmin()` returns a NaN.

=== fabs

```c++
namespace sycl::ext::oneapi::experimental {
namespace sycl::ext::oneapi {

template <typename T>
T fabs(T x);
} // namespace sycl::ext::oneapi::experimental

} // namespace sycl::ext::oneapi
```

==== Description
Expand All @@ -408,4 +407,5 @@ Compute absolute value of a `bfloat16`.
|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
gmlueck marked this conversation as resolved.
Show resolved Hide resolved
|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins
|6|2022-08-24|Rajiv Deodhar |Move bfloat16 from experimental to supported
|========================================
Loading