Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #56

Merged
merged 61 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
ef4f858
[BasicAA] Add test for incorrect handling of small index sizes (NFC)
nikic Dec 10, 2024
bc0976e
[LAA] Strip non-inbounds offset in getPointerDiff() (NFC) (#118665)
nikic Dec 10, 2024
f408171
[LV][NFC] Add test cases for FindLastIV reduction idiom. (#118519)
Mel-Chen Dec 10, 2024
cb4433b
[libcxx][test] Silence nodiscard warnings for `std::expected` (#119174)
StephanTLavavej Dec 10, 2024
740861d
[clang] Fix a crash issue that caused by handling of fields with init…
yronglin Dec 10, 2024
05b907f
[VectorCombine] foldShuffleOfShuffles - allow fold with only single s…
RKSimon Dec 10, 2024
f6289f1
[LoongArch] Enable `AllNBitUsers` checking for {DIV,MOD}.W{U} with di…
heiher Dec 10, 2024
cc1a2ea
[AArch64] Implement FP8 SVE intrinsics for widening conversions (#118…
momchil-velikov Dec 10, 2024
e6ba345
[X86][AVX10.2] Add comments for the avx10_2copyintrin.h file (#119238)
mikolaj-pirog Dec 10, 2024
20aed3f
[gn] port 2c0b8b10dd1a
nico Dec 10, 2024
f8a1f42
[test][flang][driver] Fix test that assumes libomp default (#119368)
pawosm-arm Dec 10, 2024
28a0ad0
[flang][hlfir] fix issue 118922 (#119219)
jeanPerier Dec 10, 2024
502c08e
[clang][ExprConst] Move vector diagnostics to checkBitCastConstexprEl…
tbaederr Dec 10, 2024
0ee5924
[clang] wasm cpu name is supposed to be lime1, not lime (#119262)
programmerjake Dec 10, 2024
e665e78
[SelectionDAG] Use the nuw flag when expanding loads. (#119288)
sunfishcode Dec 10, 2024
df4c5d5
workflows: Rewrite build-ci-container to work on larger runners (#117…
tstellar Dec 10, 2024
bd231da
[libc][workflow] address permission concern and add more comments (#1…
SchrodingerZhu Dec 10, 2024
8a494dd
Nominating Sven van Haastregt as OpenCL maintainer in Clang (#119383)
AnastasiaStulova Dec 10, 2024
dadd845
Removed Anastasia Stulova from Office Hours Calendar. (#119384)
AnastasiaStulova Dec 10, 2024
c166a9c
[libc++] Add #if 0 block to all the top-level headers (#119234)
philnik777 Dec 10, 2024
ecbf64d
[libc++] Try handling spurious cancellation in the mainline CI restarter
ldionne Dec 10, 2024
9865296
[StructurizeCFG] Use `poison` instead of `undef` as placeholder [NFC]…
pedroclobo Dec 10, 2024
20b071c
[CGData] Change placeholder from `undef` to `poison` when initializin…
pedroclobo Dec 10, 2024
d7c12ea
[LoopRotate] Use `poison` instead of `undef` as placeholder in debug …
pedroclobo Dec 10, 2024
bd8eb78
[libc++] Temporarily disable FreeBSD runners
ldionne Dec 10, 2024
01512d2
[libc++] Document guidelines for symbols baked into the ABI (#118526)
ldionne Dec 10, 2024
e3284d8
[GISel] Use SmallVector::append instead of copying one element at a t…
topperc Dec 10, 2024
eacdbc2
[libc++][test] Fix invalid const conversion in limited_allocator (#11…
winner245 Dec 10, 2024
97ff961
[AArch64] Improve code generation of bool vector reduce operations (#…
Il-Capitano Dec 10, 2024
da421f5
[SLP] NFC. Make InstructionsState more constant. (#118609)
HanKuanChen Dec 10, 2024
7ea1fe7
Revert "[libc++] Try handling spurious cancellation in the mainline C…
ldionne Dec 10, 2024
3654f1b
[LLVM][IR] Add support for vector ConstantInt/FP to ConstandFolding:F…
paulwalker-arm Dec 10, 2024
f28e522
[Clang] Change two placeholders from `undef` to `poison` [NFC] (#119141)
pedroclobo Dec 10, 2024
f31099c
[PowerPC][AIX] Emit PowerPC version for XCOFF (#113214)
amy-kwan Dec 10, 2024
4d06623
recalculate the live interval of the defined register of xvmaddmdp i…
diggerlin Dec 10, 2024
ed91843
[WebAssembly] Handle symbols in `.init_array` sections (#119127)
georgestagg Dec 10, 2024
4f93327
[CostModel][X86] Improve cost estimation of insert_subvector shuffle …
RKSimon Dec 10, 2024
444e53f
[SelectOpt] Fix incorrect IR for SUB when comparison dependent operan…
igogo-x86 Dec 10, 2024
5a0d73b
[compiler-rt][AArch64] NFCI: Simplify __arm_get_current_vg. (#119210)
sdesmalen-arm Dec 10, 2024
708a478
[RISCV] Add stack clash protection (#117612)
rzinsly Dec 10, 2024
74486dc
[Offload] Add CMake cache to be used in AMDGPU bot (#119369)
jplehr Dec 10, 2024
3a573dc
[RISCV][VLOPT] Add support for integer multiply-add instructions (#11…
michaelmaitland Dec 10, 2024
0fb0617
[clang][bytecode] Check vector element types for eligibility (#119385)
tbaederr Dec 10, 2024
431ea2d
[libc] move bcmp, bzero, bcopy, index, rindex, strcasecmp, strncasecm…
nickdesaulniers Dec 10, 2024
1d7d005
[libc] move src/network to src/arpa/inet (#119273)
nickdesaulniers Dec 10, 2024
8a25398
[libc] move pthread macros to dedicated header (#119286)
nickdesaulniers Dec 10, 2024
8ca4aa5
[RISCV][VLOPT] Use vadd as user instruction in vl-opt-instrs test in …
michaelmaitland Dec 10, 2024
9735873
[mlir][mlir-vulkan-runner] Move part of device pass pipeline to mlir-…
andfau-amd Dec 10, 2024
c7634c1
[flang] Disabled hlfir.sum inlining by default. (#119287)
vzakhari Dec 10, 2024
c5ab70c
[WebAssembly] Add `-i128:128` to the `datalayout` string. (#119204)
sunfishcode Dec 10, 2024
df3397b
[ELF] Improve canBeOmittedFromSymbolTable tests
MaskRay Dec 10, 2024
5041d06
[MC] Fix DWARF file table for files with empty DWARF (#119020) (#119229)
noxwell Dec 10, 2024
c5a21c1
[PhaseOrdering][X86] Add test coverage based off #111431
RKSimon Dec 10, 2024
d6590c1
[MLIR] Add allow Insert/extract slice option to pack/unpack op (#117340)
jerryyin Dec 10, 2024
1a650fd
[lldb] Load embedded type summary section (#7859) (#8040)
kastiglione Jan 24, 2024
9a9c1d4
[lldb] Implement a formatter bytecode interpreter in C++
adrian-prantl Oct 29, 2024
e2bb474
[lldb] Add comment
adrian-prantl Dec 10, 2024
15f87bc
[NFC][AMDGPU] Auto generate check lines for `llvm/test/CodeGen/AMDGPU…
shiltian Dec 10, 2024
13539c2
[RISCV][GISEl] Simplify GISelPredicateCode for binop_with_non_imm12. NFC
topperc Dec 10, 2024
a42aa8f
[SLP]Fix adjusting of the mask for the fully matched nodes.
alexey-bataev Dec 10, 2024
0469bb9
[flang][cuda] Fix lowering when step is a variable (#119421)
clementval Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[AArch64] Implement FP8 SVE intrinsics for widening conversions (llvm…
…#118123)

This patch adds the following intrinsics:
* 8-bit floating-point convert to half-precision and BFloat16.

  // Variants are also available for: _bf16
  svfloat16_t svcvt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);

* 8-bit floating-point convert to half-precision and BFloat16 (top).

  // Variants are also available for: _bf16
  svfloat16_t svcvtlt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvtlt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  • Loading branch information
momchil-velikov authored Dec 10, 2024
commit cc1a2ea61e3f8e790125b10d9ec4e7d179156ddf
20 changes: 16 additions & 4 deletions clang/include/clang/Basic/arm_sve.td
Original file line number Diff line number Diff line change
Expand Up @@ -2430,12 +2430,12 @@ let SVETargetGuard = InvalidMode, SMETargetGuard = "sme2,fp8" in {
def FSCALE_X4 : Inst<"svscale[_{d}_x4]", "444.x", "fhd", MergeNone, "aarch64_sme_fp8_scale_x4", [IsStreaming],[]>;

// Convert from FP8 to half-precision/BFloat16 multi-vector
def SVF1CVT : Inst<"svcvt1_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvt1_x2", [IsStreaming, SetsFPMR], []>;
def SVF2CVT : Inst<"svcvt2_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvt2_x2", [IsStreaming, SetsFPMR], []>;
def SVF1CVT_X2 : Inst<"svcvt1_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvt1_x2", [IsStreaming, SetsFPMR], []>;
def SVF2CVT_X2 : Inst<"svcvt2_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvt2_x2", [IsStreaming, SetsFPMR], []>;

// Convert from FP8 to deinterleaved half-precision/BFloat16 multi-vector
def SVF1CVTL : Inst<"svcvtl1_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvtl1_x2", [IsStreaming, SetsFPMR], []>;
def SVF2CVTL : Inst<"svcvtl2_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvtl2_x2", [IsStreaming, SetsFPMR], []>;
def SVF1CVTL_X2 : Inst<"svcvtl1_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvtl1_x2", [IsStreaming, SetsFPMR], []>;
def SVF2CVTL_X2 : Inst<"svcvtl2_{d}[_mf8]_x2_fpm", "2~>", "bh", MergeNone, "aarch64_sve_fp8_cvtl2_x2", [IsStreaming, SetsFPMR], []>;
}

let SVETargetGuard = "sve2p1", SMETargetGuard = "sme2" in {
Expand All @@ -2451,3 +2451,15 @@ let SVETargetGuard = "sve2,faminmax", SMETargetGuard = "sme2,faminmax" in {
defm SVAMIN : SInstZPZZ<"svamin", "hfd", "aarch64_sve_famin", "aarch64_sve_famin_u">;
defm SVAMAX : SInstZPZZ<"svamax", "hfd", "aarch64_sve_famax", "aarch64_sve_famax_u">;
}

let SVETargetGuard = "sve2,fp8", SMETargetGuard = "sme2,fp8" in {
// SVE FP8 widening conversions

// 8-bit floating-point convert to BFloat16/Float16
def SVF1CVT : SInst<"svcvt1_{d}[_mf8]_fpm", "d~>", "bh", MergeNone, "aarch64_sve_fp8_cvt1", [VerifyRuntimeMode, SetsFPMR]>;
def SVF2CVT : SInst<"svcvt2_{d}[_mf8]_fpm", "d~>", "bh", MergeNone, "aarch64_sve_fp8_cvt2", [VerifyRuntimeMode, SetsFPMR]>;

// 8-bit floating-point convert to BFloat16/Float16 (top)
def SVF1CVTLT : SInst<"svcvtlt1_{d}[_mf8]_fpm", "d~>", "bh", MergeNone, "aarch64_sve_fp8_cvtlt1", [VerifyRuntimeMode, SetsFPMR]>;
def SVF2CVTLT : SInst<"svcvtlt2_{d}[_mf8]_fpm", "d~>", "bh", MergeNone, "aarch64_sve_fp8_cvtlt2", [VerifyRuntimeMode, SetsFPMR]>;
}
173 changes: 173 additions & 0 deletions clang/test/CodeGen/AArch64/fp8-intrinsics/acle_sve2_fp8_cvt.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2 -target-feature +fp8 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -x c++ -triple aarch64-none-linux-gnu -target-feature +sme -target-feature +sme2 -target-feature +fp8 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CHECK-CXX

// RUN: %clang_cc1 -DSME_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme -target-feature +sme2 -target-feature +fp8 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -DSME_OVERLOADED_FORMS -x c++ -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2 -target-feature +fp8 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CHECK-CXX

// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2 -target-feature +fp8 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -target-feature +sme2 -target-feature +fp8 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s

// REQUIRES: aarch64-registered-target

#ifdef __ARM_FEATURE_SME
#include <arm_sme.h>
#else
#include <arm_sve.h>
#endif

#ifdef SVE_OVERLOADED_FORMS
#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3) A1##A3
#else
#define SVE_ACLE_FUNC(A1,A2,A3) A1##A2##A3
#endif

#ifdef __ARM_FEATURE_SME
#define STREAMING __arm_streaming
#else
#define STREAMING
#endif

// CHECK-LABEL: define dso_local <vscale x 8 x bfloat> @test_svcvt1_bf16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0:[0-9]+]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt1.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x bfloat> @_Z20test_svcvt1_bf16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0:[0-9]+]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt1.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
svbfloat16_t test_svcvt1_bf16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvt1_bf16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x bfloat> @test_svcvt2_bf16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt2.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x bfloat> @_Z20test_svcvt2_bf16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt2.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
svbfloat16_t test_svcvt2_bf16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvt2_bf16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x bfloat> @test_svcvtlt1_bf16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt1.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x bfloat> @_Z22test_svcvtlt1_bf16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt1.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
svbfloat16_t test_svcvtlt1_bf16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvtlt1_bf16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x bfloat> @test_svcvtlt2_bf16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt2.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x bfloat> @_Z22test_svcvtlt2_bf16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt2.nxv8bf16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x bfloat> [[TMP0]]
//
svbfloat16_t test_svcvtlt2_bf16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvtlt2_bf16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x half> @test_svcvt1_f16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt1.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x half> @_Z19test_svcvt1_f16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt1.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
svfloat16_t test_svcvt1_f16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvt1_f16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x half> @test_svcvt2_f16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt2.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x half> @_Z19test_svcvt2_f16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt2.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
svfloat16_t test_svcvt2_f16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvt2_f16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x half> @test_svcvtlt1_f16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt1.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x half> @_Z21test_svcvtlt1_f16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt1.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
svfloat16_t test_svcvtlt1_f16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvtlt1_f16,_mf8,_fpm)(zn, fpm);
}

// CHECK-LABEL: define dso_local <vscale x 8 x half> @test_svcvtlt2_f16_mf8(
// CHECK-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
// CHECK-CXX-LABEL: define dso_local <vscale x 8 x half> @_Z21test_svcvtlt2_f16_mf8u13__SVMfloat8_tm(
// CHECK-CXX-SAME: <vscale x 16 x i8> [[ZN:%.*]], i64 noundef [[FPM:%.*]]) #[[ATTR0]] {
// CHECK-CXX-NEXT: [[ENTRY:.*:]]
// CHECK-CXX-NEXT: tail call void @llvm.aarch64.set.fpmr(i64 [[FPM]])
// CHECK-CXX-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8> [[ZN]])
// CHECK-CXX-NEXT: ret <vscale x 8 x half> [[TMP0]]
//
svfloat16_t test_svcvtlt2_f16_mf8(svmfloat8_t zn, fpm_t fpm) STREAMING {
return SVE_ACLE_FUNC(svcvtlt2_f16,_mf8,_fpm)(zn, fpm);
}
24 changes: 24 additions & 0 deletions clang/test/Sema/aarch64-sve2-intrinsics/acle_sve2_fp8.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// REQUIRES: aarch64-registered-target

// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -verify -emit-llvm %s

#include <arm_sve.h>

void test_features(svmfloat8_t zn, fpm_t fpm) {
svcvt1_bf16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvt1_bf16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvt2_bf16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvt2_bf16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvtlt1_bf16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvtlt1_bf16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvtlt2_bf16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvtlt2_bf16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvt1_f16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvt1_f16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvt2_f16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvt2_f16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvtlt1_f16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvtlt1_f16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
svcvtlt2_f16_mf8_fpm(zn, fpm);
// expected-error@-1 {{'svcvtlt2_f16_mf8_fpm' needs target feature (sve,sve2,fp8)|(sme,sme2,fp8)}}
}
13 changes: 12 additions & 1 deletion llvm/include/llvm/IR/IntrinsicsAArch64.td
Original file line number Diff line number Diff line change
Expand Up @@ -3860,6 +3860,17 @@ def int_aarch64_neon_famin : AdvSIMD_2VectorArg_Intrinsic;
//
let TargetPrefix = "aarch64" in {

// SVE Widening Conversions
class SVE2_FP8_Cvt
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_nxv16i8_ty],
[IntrReadMem, IntrInaccessibleMemOnly]>;

def int_aarch64_sve_fp8_cvt1 : SVE2_FP8_Cvt;
def int_aarch64_sve_fp8_cvt2 : SVE2_FP8_Cvt;
def int_aarch64_sve_fp8_cvtlt1 : SVE2_FP8_Cvt;
def int_aarch64_sve_fp8_cvtlt2 : SVE2_FP8_Cvt;

class SME2_FP8_CVT_X2_Single_Intrinsic
: DefaultAttrsIntrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
[llvm_nxv16i8_ty],
Expand All @@ -3886,4 +3897,4 @@ let TargetPrefix = "aarch64" in {
// FP8 outer product
def int_aarch64_sme_fp8_fmopa_za16 : SME_FP8_OuterProduct_Intrinsic;
def int_aarch64_sme_fp8_fmopa_za32 : SME_FP8_OuterProduct_Intrinsic;
}
}
16 changes: 8 additions & 8 deletions llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -4369,14 +4369,14 @@ let Predicates = [HasNonStreamingSVE2p2orSME2p2] in {
//===----------------------------------------------------------------------===//
let Predicates = [HasSVE2orSME2, HasFP8] in {
// FP8 upconvert
defm F1CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b00, "f1cvt">;
defm F2CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b01, "f2cvt">;
defm BF1CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b10, "bf1cvt">;
defm BF2CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b11, "bf2cvt">;
defm F1CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b00, "f1cvtlt">;
defm F2CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b01, "f2cvtlt">;
defm BF1CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b10, "bf1cvtlt">;
defm BF2CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b11, "bf2cvtlt">;
defm F1CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b00, "f1cvt", nxv8f16, int_aarch64_sve_fp8_cvt1>;
defm F2CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b01, "f2cvt", nxv8f16, int_aarch64_sve_fp8_cvt2>;
defm BF1CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b10, "bf1cvt", nxv8bf16, int_aarch64_sve_fp8_cvt1>;
defm BF2CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b11, "bf2cvt", nxv8bf16, int_aarch64_sve_fp8_cvt2>;
defm F1CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b00, "f1cvtlt", nxv8f16, int_aarch64_sve_fp8_cvtlt1>;
defm F2CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b01, "f2cvtlt", nxv8f16, int_aarch64_sve_fp8_cvtlt2>;
defm BF1CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b10, "bf1cvtlt", nxv8bf16, int_aarch64_sve_fp8_cvtlt1>;
defm BF2CVTLT_ZZ : sve2_fp8_cvt_single<0b1, 0b11, "bf2cvtlt", nxv8bf16, int_aarch64_sve_fp8_cvtlt2>;

// FP8 downconvert
defm FCVTN_Z2Z_HtoB : sve2_fp8_down_cvt_single<0b00, "fcvtn", ZZ_h_mul_r>;
Expand Down
7 changes: 6 additions & 1 deletion llvm/lib/Target/AArch64/SVEInstrFormats.td
Original file line number Diff line number Diff line change
Expand Up @@ -10769,10 +10769,15 @@ class sve2_fp8_cvt_single<bit L, bits<2> opc, string mnemonic,
let Inst{9-5} = Zn;
let Inst{4-0} = Zd;
let Uses = [FPMR, FPCR];

let mayLoad = 1;
let mayStore = 0;
}

multiclass sve2_fp8_cvt_single<bit L, bits<2> opc, string mnemonic> {
multiclass sve2_fp8_cvt_single<bit L, bits<2> opc, string mnemonic, ValueType vtd, SDPatternOperator op> {
def _BtoH : sve2_fp8_cvt_single<L, opc, mnemonic, ZPR16, ZPR8>;

def : SVE_1_Op_Pat<vtd, op, nxv16i8, !cast<Instruction>(NAME # _BtoH)>;
}

// FP8 downconvert
Expand Down
78 changes: 78 additions & 0 deletions llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s

target triple = "aarch64-linux"

define <vscale x 8 x bfloat> @cvt1_bf16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvt1_bf16:
; CHECK: // %bb.0:
; CHECK-NEXT: bf1cvt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt1.nxv8bf16(<vscale x 16 x i8> %s)
ret <vscale x 8 x bfloat> %r
}

define <vscale x 8 x bfloat> @cvt2_bf16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvt2_bf16:
; CHECK: // %bb.0:
; CHECK-NEXT: bf2cvt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvt2.nxv8bf16(<vscale x 16 x i8> %s)
ret <vscale x 8 x bfloat> %r
}

define <vscale x 8 x bfloat> @cvtlt1_bf16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvtlt1_bf16:
; CHECK: // %bb.0:
; CHECK-NEXT: bf1cvtlt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt1.nxv8bf16(<vscale x 16 x i8> %s)
ret <vscale x 8 x bfloat> %r
}

define <vscale x 8 x bfloat> @cvtlt2_bf16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvtlt2_bf16:
; CHECK: // %bb.0:
; CHECK-NEXT: bf2cvtlt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fp8.cvtlt2.nxv8bf16(<vscale x 16 x i8> %s)
ret <vscale x 8 x bfloat> %r
}

define <vscale x 8 x half> @cvt1_f16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvt1_f16:
; CHECK: // %bb.0:
; CHECK-NEXT: f1cvt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt1.nxv8f16(<vscale x 16 x i8> %s)
ret <vscale x 8 x half> %r
}

define <vscale x 8 x half> @cvt2_f16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvt2_f16:
; CHECK: // %bb.0:
; CHECK-NEXT: f2cvt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvt2.nxv8f16(<vscale x 16 x i8> %s)
ret <vscale x 8 x half> %r
}


define <vscale x 8 x half> @cvtlt1_f16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvtlt1_f16:
; CHECK: // %bb.0:
; CHECK-NEXT: f1cvtlt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt1.nxv8f16(<vscale x 16 x i8> %s)
ret <vscale x 8 x half> %r
}

define <vscale x 8 x half> @cvtlt2_f16(<vscale x 16 x i8> %s) {
; CHECK-LABEL: cvtlt2_f16:
; CHECK: // %bb.0:
; CHECK-NEXT: f2cvtlt z0.h, z0.b
; CHECK-NEXT: ret
%r = call <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8> %s)
ret <vscale x 8 x half> %r
}