Skip to content

[SYCL] Allow _Bitint of size greater than 128 bits when -fintelfpga is used #6295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clang/include/clang/Basic/LangOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ LANGOPT(GPUExcludeWrongSideOverloads, 1, 0, "always exclude wrong side overloads

LANGOPT(SYCLIsDevice , 1, 0, "Generate code for SYCL device")
LANGOPT(SYCLIsHost , 1, 0, "SYCL host compilation")
LANGOPT(IntelFPGA , 1, 0, "Perform ahead-of-time compilation for FPGA")
LANGOPT(SYCLAllowFuncPtr , 1, 0, "Allow function pointers in SYCL device code")
LANGOPT(SYCLStdLayoutKernelParams, 1, 0, "Enable standard layout requirement for SYCL kernel parameters")
LANGOPT(SYCLUnnamedLambda , 1, 0, "Allow unnamed lambda SYCL kernels")
Expand Down
3 changes: 2 additions & 1 deletion clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -2738,7 +2738,8 @@ defm strict_vtable_pointers : BoolFOption<"strict-vtable-pointers",
NegFlag<SetFalse>>;
def fstrict_overflow : Flag<["-"], "fstrict-overflow">, Group<f_Group>;
def fintelfpga : Flag<["-"], "fintelfpga">, Group<f_Group>,
Flags<[CC1Option, CoreOption]>, HelpText<"Perform ahead of time compilation for FPGA">;
Flags<[CC1Option, CoreOption]>, MarshallingInfoFlag<LangOpts<"IntelFPGA">>,
HelpText<"Perform ahead-of-time compilation for FPGA">;
def fsycl_device_only : Flag<["-"], "fsycl-device-only">, Flags<[CoreOption]>,
HelpText<"Compile SYCL kernels for device">;
def fsycl_targets_EQ : CommaJoined<["-"], "fsycl-targets=">, Flags<[NoXarchOption, CC1Option, CoreOption]>,
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/Basic/Targets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,8 @@ TargetInfo *AllocateTarget(const llvm::Triple &Triple,

case llvm::Triple::spir64: {
llvm::Triple HT(Opts.HostTriple);
bool IsFPGASubArch = Triple.getSubArch() == llvm::Triple::SPIRSubArch_fpga;

switch (HT.getOS()) {
case llvm::Triple::Win32:
switch (HT.getEnvironment()) {
Expand All @@ -626,8 +628,12 @@ TargetInfo *AllocateTarget(const llvm::Triple &Triple,
return new MicrosoftX86_64_SPIR64TargetInfo(Triple, Opts);
}
case llvm::Triple::Linux:
if (IsFPGASubArch)
return new LinuxTargetInfo<SPIR64FPGATargetInfo>(Triple, Opts);
return new LinuxTargetInfo<SPIR64TargetInfo>(Triple, Opts);
default:
if (IsFPGASubArch)
return new SPIR64FPGATargetInfo(Triple, Opts);
return new SPIR64TargetInfo(Triple, Opts);
}
}
Expand Down
8 changes: 8 additions & 0 deletions clang/lib/Basic/Targets/SPIR.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,14 @@ class LLVM_LIBRARY_VISIBILITY SPIR64TargetInfo : public SPIRTargetInfo {
MacroBuilder &Builder) const override;
};

// spir64_fpga target
class LLVM_LIBRARY_VISIBILITY SPIR64FPGATargetInfo : public SPIR64TargetInfo {
public:
SPIR64FPGATargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: SPIR64TargetInfo(Triple, Opts) {}
virtual size_t getMaxBitIntWidth() const override { return 2048; }
};

// x86-32 SPIR Windows target
class LLVM_LIBRARY_VISIBILITY WindowsX86_32SPIRTargetInfo
: public WindowsTargetInfo<SPIR32TargetInfo> {
Expand Down
16 changes: 12 additions & 4 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4937,14 +4937,22 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// Disable parallel for range-rounding for anything involving FPGA
auto SYCLTCRange = C.getOffloadToolChains<Action::OFK_SYCL>();
bool HasFPGA = false;
for (auto TI = SYCLTCRange.first, TE = SYCLTCRange.second; TI != TE; ++TI)
if (TI->second->getTriple().getSubArch() ==
llvm::Triple::SPIRSubArch_fpga) {
for (auto TI = SYCLTCRange.first, TE = SYCLTCRange.second; TI != TE; ++TI) {
llvm::Triple SYCLTriple = TI->second->getTriple();
if (SYCLTriple.getSubArch() == llvm::Triple::SPIRSubArch_fpga) {
HasFPGA = true;
if (!IsSYCLOffloadDevice) {
CmdArgs.push_back("-aux-triple");
CmdArgs.push_back(Args.MakeArgString(SYCLTriple.getTriple()));
}
break;
}
if (HasFPGA)
}
if (HasFPGA) {
CmdArgs.push_back("-fsycl-disable-range-rounding");
// Pass -fintelfpga to both the host and device SYCL compilations if set.
CmdArgs.push_back("-fintelfpga");
}

// Add any options that are needed specific to SYCL offload while
// performing the host side compilation.
Expand Down
2 changes: 1 addition & 1 deletion clang/lib/Frontend/CompilerInstance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ bool CompilerInstance::createTarget() {
// other side of CUDA/OpenMP/SYCL compilation.
if (!getAuxTarget() &&
(getLangOpts().CUDA || getLangOpts().OpenMPIsDevice ||
getLangOpts().SYCLIsDevice) &&
getLangOpts().isSYCL()) &&
!getFrontendOpts().AuxTriple.empty()) {
auto TO = std::make_shared<TargetOptions>();
TO->Triple = llvm::Triple::normalize(getFrontendOpts().AuxTriple);
Expand Down
11 changes: 8 additions & 3 deletions clang/lib/Sema/SemaType.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2305,10 +2305,15 @@ QualType Sema::BuildBitIntType(bool IsUnsigned, Expr *BitWidth,
return QualType();
}

const TargetInfo &TI = getASTContext().getTargetInfo();
if (NumBits > TI.getMaxBitIntWidth()) {
// If the number of bits exceed the maximum bit width supported on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I guess this would cause issues if the code is executed on the other platform, right? So we DO want to enforce the 'host' limit, but only if the 'host' bit-int gets emitted/used/etc? And vise-versa for the device? Or are we just hoping the failure is not quite silent later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are we just hoping the failure is not quite silent later?

Right, I think that is what I was hoping. In this case we are in a configuration where one platform allows it but the other does not. Rather than preventing compilation here, I was relying on later phases catching it "nicely".

(In the specific case of intelfpga, which is where this can happen currently, there wouldn't be an issue later.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is incredibly unfortunate... We don't really have a good way of diagnosing with 'deferred diagnostics' on host unfortunately, so I suspect we're going to have to live with this.

// both host and device, issue an error diagnostic.
const TargetInfo *AuxTargetInfo = getASTContext().getAuxTargetInfo();
size_t MaxBitIntWidth = std::max(
(AuxTargetInfo == nullptr) ? 0 : AuxTargetInfo->getMaxBitIntWidth(),
getASTContext().getTargetInfo().getMaxBitIntWidth());
if (NumBits > MaxBitIntWidth) {
Diag(Loc, diag::err_bit_int_max_size)
<< IsUnsigned << static_cast<uint64_t>(TI.getMaxBitIntWidth());
<< IsUnsigned << static_cast<uint64_t>(MaxBitIntWidth);
return QualType();
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// RUN: %clang_cc1 -no-opaque-pointers -fsycl-is-host -fintelfpga -triple x86_64 -aux-triple spir64_fpga -emit-llvm %s -o - | FileCheck %s

// This test checks that we generate appropriate code for division
// operations of _BitInts of size greater than 128 bits, since it
// is allowed when -fintelfpga is enabled. The test uses a value of
// 2048 for the bitsize as that is the maximum that is currently
// supported.

// CHECK: define{{.*}} void @_Z3fooDB2048_S_(i2048* {{.*}} sret(i2048) align 8 %agg.result, i2048* {{.*}} byval(i2048) align 8 %[[ARG1:[0-9]+]], i2048* {{.*}} byval(i2048) align 8 %[[ARG2:[0-9]+]])
signed _BitInt(2048) foo(signed _BitInt(2048) a, signed _BitInt(2048) b) {
// CHECK: %[[VAR_A:a]].addr = alloca i2048, align 8
// CHECK: %[[VAR_B:b]].addr = alloca i2048, align 8
// CHECK: %[[VAR_A]] = load i2048, i2048* %[[ARG1]], align 8
// CHECK: %[[VAR_B]] = load i2048, i2048* %[[ARG2]], align 8
// CHECK: store i2048 %[[VAR_A]], i2048* %[[VAR_A]].addr, align 8
// CHECK: store i2048 %[[VAR_B]], i2048* %[[VAR_B]].addr, align 8
// CHECK: %[[TEMP1:[0-9]+]] = load i2048, i2048* %[[VAR_A]].addr, align 8
// CHECK: %[[TEMP2:[0-9]+]] = load i2048, i2048* %[[VAR_B]].addr, align 8
// CHECK: %div = sdiv i2048 %[[TEMP1]], %[[TEMP2]]
// CHECK: store i2048 %div, i2048* %agg.result, align 8
// CHECK: %[[RES:[0-9+]]] = load i2048, i2048* %agg.result, align 8
// CHECK: store i2048 %[[RES]], i2048* %agg.result, align 8
// CHECK: ret void
return a / b;
}
27 changes: 27 additions & 0 deletions clang/test/CodeGenSYCL/no-opaque-ptrs-sycl-intelfpga-bitint.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// RUN: %clang_cc1 -no-opaque-pointers -fsycl-is-device -fintelfpga -triple spir64_fpga -aux-triple x86_64-unknown-linux-gnu -IInputs -emit-llvm %s -o - | FileCheck %s

// This test checks that we generate appropriate code for division
// operations of _BitInts of size greater than 128 bits, since it
// is allowed when -fintelfpga is enabled. The test uses a value
// of 2048 for the bitsize, the max that is currently supported.

#include "Inputs/sycl.hpp"

// CHECK: define{{.*}} void @_Z3fooDB2048_S_(i2048 addrspace(4)* {{.*}} sret(i2048) align 8 %agg.result, i2048* {{.*}} byval(i2048) align 8 %[[ARG1:[0-9]+]], i2048* {{.*}} byval(i2048) align 8 %[[ARG2:[0-9]+]])
signed _BitInt(2048) foo(signed _BitInt(2048) a, signed _BitInt(2048) b) {
// CHECK: %[[VAR_A:a]] = load i2048, i2048* %[[ARG1]], align 8
// CHECK: %[[VAR_B:b]] = load i2048, i2048* %[[ARG2]], align 8
// CHECK: %[[RES:div]] = sdiv i2048 %[[VAR_A]], %[[VAR_B]]
// CHECK: store i2048 %[[RES]], i2048 addrspace(4)* %agg.result, align 8
// CHECK: ret void
return a / b;
}

int main() {
sycl::handler h;
auto lambda = []() {
_BitInt(2048) a, b = 3, c = 4;
a = foo(b, c);
};
h.single_task(lambda);
}
24 changes: 24 additions & 0 deletions clang/test/CodeGenSYCL/sycl-host-intelfpga-bitint.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// RUN: %clang_cc1 -opaque-pointers -fsycl-is-host -fintelfpga -triple x86_64 -aux-triple spir64_fpga -emit-llvm %s -o - | FileCheck %s

// This test checks that we generate appropriate code for division
// operations of _BitInts of size greater than 128 bits, since it
// is allowed when -fintelfpga is enabled. The test uses a value of
// 2048, the maximum bitsize that is currently supported.

// CHECK: define{{.*}} void @_Z3fooDB2048_S_(ptr {{.*}} sret(i2048) align 8 %agg.result, ptr {{.*}} byval(i2048) align 8 %[[ARG1:[0-9]+]], ptr {{.*}} byval(i2048) align 8 %[[ARG2:[0-9]+]])
signed _BitInt(2048) foo(signed _BitInt(2048) a, signed _BitInt(2048) b) {
// CHECK: %[[VAR_A:a]].addr = alloca i2048, align 8
// CHECK: %[[VAR_B:b]].addr = alloca i2048, align 8
// CHECK: %[[VAR_A]] = load i2048, ptr %[[ARG1]], align 8
// CHECK: %[[VAR_B]] = load i2048, ptr %[[ARG2]], align 8
// CHECK: store i2048 %[[VAR_A]], ptr %[[VAR_A]].addr, align 8
// CHECK: store i2048 %[[VAR_B]], ptr %[[VAR_B]].addr, align 8
// CHECK: %[[TEMP1:[0-9]+]] = load i2048, ptr %[[VAR_A]].addr, align 8
// CHECK: %[[TEMP2:[0-9]+]] = load i2048, ptr %[[VAR_B]].addr, align 8
// CHECK: %div = sdiv i2048 %[[TEMP1]], %[[TEMP2]]
// CHECK: store i2048 %div, ptr %agg.result, align 8
// CHECK: %[[RES:[0-9+]]] = load i2048, ptr %agg.result, align 8
// CHECK: store i2048 %[[RES]], ptr %agg.result, align 8
// CHECK: ret void
return a / b;
}
27 changes: 27 additions & 0 deletions clang/test/CodeGenSYCL/sycl-intelfpga-bitint.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// RUN: %clang_cc1 -opaque-pointers -fsycl-is-device -fintelfpga -triple spir64_fpga -aux-triple x86_64-unknown-linux-gnu -IInputs -emit-llvm %s -o - | FileCheck %s

// This test checks that we generate appropriate code for division
// operations of _BitInts of size greater than 128 bits, since it
// is allowed when -fintelfpga is enabled. The test uses a value of
// 2048 for bitint size, the maximum that is currently supported.

#include "Inputs/sycl.hpp"

// CHECK: define{{.*}} void @_Z3fooDB2048_S_(ptr addrspace(4) {{.*}} sret(i2048) align 8 %agg.result, ptr {{.*}} byval(i2048) align 8 %[[ARG1:[0-9]+]], ptr {{.*}} byval(i2048) align 8 %[[ARG2:[0-9]+]])
signed _BitInt(2048) foo(signed _BitInt(2048) a, signed _BitInt(2048) b) {
// CHECK: %[[VAR_A:a]] = load i2048, ptr %[[ARG1]], align 8
// CHECK: %[[VAR_B:b]] = load i2048, ptr %[[ARG2]], align 8
// CHECK: %[[RES:div]] = sdiv i2048 %[[VAR_A]], %[[VAR_B]]
// CHECK: store i2048 %[[RES]], ptr addrspace(4) %agg.result, align 8
// CHECK: ret void
return a / b;
}

int main() {
sycl::handler h;
auto lambda = []() {
_BitInt(2048) a, b = 3, c = 4;
a = foo(b, c);
};
h.single_task(lambda);
}
6 changes: 6 additions & 0 deletions clang/test/Driver/sycl-offload-intelfpga.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@
// RUN: | FileCheck -check-prefix=CHK-TOOLS-INTELFPGA-G0 %s
// CHK-TOOLS-INTELFPGA-G0-NOT: clang{{.*}} "-debug-info-kind=constructor"

/// -fintelfpga passes it to host and device cc1 compilations
// RUN: %clangxx -### -fsycl -fintelfpga %s 2>&1 \
// RUN: | FileCheck -check-prefix=CHK-HOST-DEVICE %s
// CHK-HOST-DEVICE: clang{{.*}} "-cc1"{{.*}} "-fsycl-is-device"{{.*}} "-fintelfpga"
// CHK-HOST-DEVICE: clang{{.*}} "-cc1"{{.*}} "-fintelfpga"{{.*}} "-fsycl-is-host"

/// FPGA target implies -fsycl-disable-range-rounding
// RUN: %clangxx -### -target x86_64-unknown-linux-gnu -fsycl -fintelfpga %s 2>&1 \
// RUN: | FileCheck -check-prefix=CHK-RANGE-ROUNDING %s
Expand Down
6 changes: 6 additions & 0 deletions clang/test/Driver/sycl-offload.c
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,12 @@

/// ###########################################################################

/// Check that -aux-triple is passed with -fsycl -fintelfpga
// RUN: %clang -### -fsycl -fintelfpga %s 2>&1 \
// RUN: | FileCheck -DARCH=x86_64 -DARCH2=spir64_fpga -check-prefix=CHK-SYCL-FPGA-AUX-TRIPLE %s
// CHK-SYCL-FPGA-AUX-TRIPLE: clang{{.*}} "-cc1" "-triple" "[[ARCH]]-unknown-linux-gnu"{{.*}} "-aux-triple" "[[ARCH2]]-unknown-unknown"{{.*}} "-fsycl-is-host"
/// ###########################################################################

/// Validate SYCL option values
// RUN: %clang -### -fsycl-device-code-split=bad_value -fsycl %s 2>&1 \
// RUN: | FileCheck -check-prefix=CHK-SYCL-BAD-OPT-VALUE -Doption=-fsycl-device-code-split %s
Expand Down
23 changes: 23 additions & 0 deletions clang/test/SemaSYCL/sycl-intelfpga.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
// RUN: %clang_cc1 -fsycl-is-device -fintelfpga -verify=device-intelfpga -fsyntax-only %s -triple spir64_fpga -aux-triple x86_64-unknown-linux-gnu
// RUN: %clang_cc1 -fsycl-is-host -fintelfpga -verify=host-intelfpga -fsyntax-only %s -triple x86_64 -aux-triple spir64_fpga
// RUN: %clang_cc1 -fsycl-is-device -verify=device -fsyntax-only %s
// RUN: %clang_cc1 -fsycl-is-host -verify=host -fsyntax-only %s

// Tests that we do not issue errors for _Bitints of size greater than 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test where we do not diagnose since that is what this comment implies? I don't have strong preference since CodeGen tests test this anyway. So I leave it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added the "passing" cases in CodeGenSYCL. I don't mind adding a case that would compile cleanly here. Is a value of 215 (greater than x86_64 limit of 128 but lower than fpga limit of 2048) okay with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

// when -fintelfpga is enabled. The backend is expected to be able to handle
// this. When -fintelfpga is not passed, we continue to diagnose.

// device-intelfpga-error@+4 3{{signed _BitInt of bit sizes greater than 2048 not supported}}
// host-intelfpga-error@+3 3{{signed _BitInt of bit sizes greater than 2048 not supported}}
// device-error@+2 3{{signed _BitInt of bit sizes greater than 128 not supported}}
// host-error@+1 3{{signed _BitInt of bit sizes greater than 128 not supported}}
signed _BitInt(2049) foo(signed _BitInt(2049) a, signed _BitInt(2049) b) {
return a / b;
}
// device-error@+4 3{{signed _BitInt of bit sizes greater than 128 not supported}}
// host-error@+3 3{{signed _BitInt of bit sizes greater than 128 not supported}}
// device-intelfpga-no-diagnostic@+2
// host-intelfpga-no-diagnostic@+1
signed _BitInt(215) foo(signed _BitInt(215) a, signed _BitInt(215) b) {
return a + b;
}