Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #8774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
27e5e55
[clang][SYCL] Fix LIT test for Windows (#8710)
Fznamznon Mar 21, 2023
7c7efee
[SYCL][Reduction] Make reducer uncopyable and immovable (#8654)
steffenlarsen Mar 21, 2023
c5c7ac2
[SYCL] Add marray support to common + some math functions (#8631)
dm-vodopyanov Mar 21, 2023
12a4566
[SYCL] Fix sycl::vec constructor ambiguity (#8608)
steffenlarsen Mar 21, 2023
86c08b3
[SYCL] Fix get_specialization_constant segmentation fault (#8542)
steffenlarsen Mar 21, 2023
0b853c2
[SYCL][NFC] Fix typo in handler 2D copy comment (#8553)
steffenlarsen Mar 21, 2023
3be2e42
[SYCL][Docs] Clarify that weak_object is only available on host (#8713)
steffenlarsen Mar 21, 2023
61e5101
[SYCL] Filter out unneeded device images with lower state than reques…
sergey-semenov Mar 21, 2023
d31623a
[SYCL][Doc] Fix link in sycl_ext_oneapi_root_group (#8715)
Pennycook Mar 22, 2023
93a629a
[SYCL] Implement two-run aspect propagation (#8681)
steffenlarsen Mar 22, 2023
ea26922
[SYCL] Revert any and all return types to int for vectors (#8591)
steffenlarsen Mar 22, 2023
488c7c9
[SYCL] Fix weak_object move and copy assignment (#8716)
steffenlarsen Mar 22, 2023
10e07a3
[SYCL] Implement is_group trait (#8711)
steffenlarsen Mar 22, 2023
13e5a32
[SYCL][NFC] Use common mock kernel info base in unit tests (#8736)
sergey-semenov Mar 22, 2023
637cb85
[SYCL] Fix dumping of images for multiple targets (#8640)
jchlanda Mar 22, 2023
1396da2
[SYCL][ESIMD]Fix regression in atomic_update (#8673)
fineg74 Mar 22, 2023
3fdbfeb
[SYCL] Fix return type of the accessor::get_pointer and local_accesso…
mmoadeli Mar 22, 2023
8c9691a
[SYCL][DOC] Update code location line number (#8515)
reble Mar 22, 2023
2816c14
[SYCL] Implement atomic_memory_scope_capabilities device query for Op…
maarquitos14 Mar 22, 2023
cfcba11
[SYCL][ESIMD] Don't add RegisterAllocMode metadata for ESIMD kernels …
sarnex Mar 22, 2023
a97afef
[SYCL] Update a few SPIRV values in compileTimeProperties (#8699)
Mar 22, 2023
3cc55de
[SYCL] Fixes AtomicMemoryScopeCapabilities unittest issue in Windows.…
maarquitos14 Mar 23, 2023
7663dc2
[SYCL] Reenable fp64 aspect runtime check (#8682)
steffenlarsen Mar 23, 2023
b18e6ea
[SYCL] atomic_memory_order_capabilities query for device and context …
Mar 23, 2023
d871541
[SYCL] Fix min and max builtins on Windows (#8752)
steffenlarsen Mar 23, 2023
130466e
[CI] Drop DPCPP_ENABLE_OPAQUE_POINTERS cmake flag (#8286)
bader Mar 23, 2023
519a8e2
[Driver][SYCL] Allow undefined symbols when doing host deps generatio…
mdtoguchi Mar 24, 2023
e09ff58
[SYCL][HIP] Adds support for PCI device id and UUID (#8233)
abagusetty Mar 24, 2023
b07cc12
[SYCL] Unroll local accessor index calculation (#8755)
sergey-semenov Mar 24, 2023
4167545
[SYCL][CMake] Build sycl-headers before bfloat16 objects (#8745)
tcreech-intel Mar 24, 2023
25d0475
[SYCL][ESIMD] BFN function implementation (#8708)
turinevgeny Mar 24, 2023
a0d0942
[SYCL][UR] Link UR PI against UR Loader (#8637)
bmyates Mar 24, 2023
4a4702e
[SYCL][CUDA] Fix the compiler error reported in https://github.com/in…
jinz2014 Mar 24, 2023
18899cc
[SYCL] Diagnose local accessor use in single_task or parallel_for(ran…
mmoadeli Mar 24, 2023
fc039fd
Add complex group algorithms (#7120)
ldrumm Mar 24, 2023
1bfd30d
[SYCL][InvokeSIMD] Add error for return type and subgroup size mismat…
sarnex Mar 24, 2023
01511a3
Revert "[SYCL] Diagnose local accessor use in single_task or parallel…
bader Mar 24, 2023
00ae1e7
[SYCL] Remove OwnZeMemHandle from USMAllocator (#7853)
igchor Mar 24, 2023
2002dc0
[SYCL] Fix test configurations when L0 and OpenCL are disabled (#8247)
steffenlarsen Mar 24, 2023
f61a136
[SYCL][OpenCL] Ban AMD OpenCL platform (#5825) (#6878)
al42and Mar 24, 2023
d01f85c
[SYCL][InvokeSIMD] Add test for invoke_simd return type error (#8772)
sarnex Mar 25, 2023
7f737b4
Merge remote-tracking branch 'upstream/sycl' into sycl-mlir
whitneywhtsang Mar 25, 2023
eacd870
Revert "[CI] Drop DPCPP_ENABLE_OPAQUE_POINTERS cmake flag (#8286)"
whitneywhtsang Mar 25, 2023
f372b39
Revert "[SYCL][UR] Link UR PI against UR Loader (#8637)"
whitneywhtsang Mar 25, 2023
dbcd0be
[SYCL-MLIR] Remove uses of 'detail::dim_loop<>'
whitneywhtsang Mar 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -911,7 +911,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
PB.registerPipelineStartEPCallback(
[&](ModulePassManager &MPM, OptimizationLevel Level) {
MPM.addPass(ESIMDVerifierPass(LangOpts.SYCLESIMDForceStatelessMem));
MPM.addPass(SYCLPropagateAspectsUsagePass());
MPM.addPass(
SYCLPropagateAspectsUsagePass(/*ExcludeAspects=*/{"fp64"}));
});

// Add the InferAddressSpaces pass for all the SPIR[V] targets
Expand Down Expand Up @@ -1026,6 +1027,10 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
if (LangOpts.EnableDAEInSpirKernels)
MPM.addPass(DeadArgumentEliminationSYCLPass());

// Rerun aspect propagation without warning diagnostics.
MPM.addPass(SYCLPropagateAspectsUsagePass(/*ExcludeAspects=*/{},
/*ValidateAspects=*/false));

// Add SPIRITTAnnotations pass to the pass manager if
// -fsycl-instrument-device-code option was passed. This option can be
// used only with spir triple.
Expand Down
7 changes: 7 additions & 0 deletions clang/lib/Driver/ToolChains/Gnu.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,13 @@ void tools::gnutools::Linker::ConstructJob(Compilation &C, const JobAction &JA,
ToolChain.addFastMathRuntimeIfAvailable(Args, CmdArgs);
}

// Performing link for dependency file information, undefined symbols are OK.
// True link time errors for symbols will be captured at host link.
if (JA.getType() == types::TY_Host_Dependencies_Image) {
CmdArgs.push_back("-z");
CmdArgs.push_back("undefs");
}

Args.AddAllArgs(CmdArgs, options::OPT_L);
Args.AddAllArgs(CmdArgs, options::OPT_u);

Expand Down
2 changes: 1 addition & 1 deletion clang/test/CodeGenSYCL/fpga-attr-do-while-loops.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// RUN: %clang++ -fsycl-device-only -fintelfpga -S %s -o - | FileCheck %s
// RUN: %clang -fsycl-device-only -fintelfpga -S %s -o - | FileCheck %s

#include "Inputs/sycl.hpp"

Expand Down
8 changes: 8 additions & 0 deletions clang/test/Driver/sycl-offload-intelfpga-link.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,14 @@
// RUN: | FileCheck %s --check-prefix=CHK-FPGA-LINK-WARN-AOCR
// CHK-FPGA-LINK-WARN-AOCR: warning: FPGA archive '{{.*}}-aocr.a' does not contain matching emulation/hardware expectancy

/// Check deps behaviors with input fat archive and creating aocx archive
// RUN: %clangxx -fsycl -fintelfpga -fsycl-link=image \
// RUN: -target x86_64-unknown-linux-gnu %S/Inputs/SYCL/liblin64.a \
// RUN: %s -### 2>&1 \
// RUN: | FileCheck %s --check-prefix=CHK-FPGA-LINK-UNDEFS
// CHK-FPGA-LINK-UNDEFS: ld{{.*}} "-z" "undefs"
// CHK-FPGA-LINK-UNDEFS: clang-offload-deps{{.*}}

/// -fintelfpga -fsycl-link from source
// RUN: touch %t.cpp
// RUN: %clangxx -target x86_64-unknown-linux-gnu -fsycl -fno-sycl-instrument-device-code -fno-sycl-device-lib=all -fintelfpga -fsycl-link=early %t.cpp -ccc-print-phases 2>&1 \
Expand Down
2 changes: 1 addition & 1 deletion clang/test/Driver/sycl-offload-static-lib-2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@
// RUN: %clangxx -target x86_64-unknown-linux-gnu -fsycl -fsycl-targets=nvptx64-nvidia-cuda %t_lib.a -o output_name -lOpenCL -### %s 2>&1 \
// RUN: | FileCheck %s -check-prefix=STATIC_LIB_SRC2 -DBUNDLE_TRIPLE=sycl-nvptx64-nvidia-cuda-sm_50 -DDEPS_TRIPLE=sycl-nvptx64-nvidia-cuda-sm_50
// STATIC_LIB_SRC2: clang{{.*}} "-emit-obj" {{.*}} "-o" "[[HOSTOBJ:.+\.o]]"
// STATIC_LIB_SRC2: ld{{(.exe)?}}" {{.*}} "-o" "[[HOSTEXE:.+\.out]]"
// STATIC_LIB_SRC2: ld{{(.exe)?}}" {{.*}} "-o" "[[HOSTEXE:.+\.out]]" {{.*}}"-z" "undefs"
// STATIC_LIB_SRC2: clang-offload-deps{{.*}} "-targets=[[DEPS_TRIPLE]]" "-outputs=[[OUTDEPS:.+\.bc]]" "[[HOSTEXE]]"
// STATIC_LIB_SRC2_DEF: clang-offload-bundler{{.*}} "-type=aoo" "-targets=[[BUNDLE_TRIPLE]]" {{.*}} "-output=[[OUTLIB:.+\.txt]]"
// STATIC_LIB_SRC2_NVPTX: clang-offload-bundler{{.*}} "-type=a" "-targets=[[BUNDLE_TRIPLE]]" {{.*}} "-output=[[OUTLIB:.+\.a]]"
Expand Down
12 changes: 12 additions & 0 deletions libclc/generic/include/spirv/spirv_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,16 @@ enum GroupOperation {
ExclusiveScan = 2,
};

typedef struct {
float real, imag;
} complex_float;

typedef struct {
double real, imag;
} complex_double;

typedef struct {
half real, imag;
} complex_half;

#endif // CLC_SPIRV_TYPES
253 changes: 210 additions & 43 deletions libclc/ptx-nvidiacl/libspirv/group/collectives.cl

Large diffs are not rendered by default.

58 changes: 47 additions & 11 deletions libclc/ptx-nvidiacl/libspirv/group/collectives_helpers.ll
Original file line number Diff line number Diff line change
@@ -1,61 +1,97 @@
; 64 storage locations is sufficient for all current-generation NVIDIA GPUs
; 64 bits per warp is sufficient for all fundamental data types
; 32 storage locations is sufficient for all current-generation NVIDIA GPUs
; 128 bits per warp is sufficient for all fundamental data types and complex
; Reducing storage for small data types or increasing it for user-defined types
; will likely require an additional pass to track group algorithm usage
@__clc__group_scratch = internal addrspace(3) global [64 x i64] undef, align 1
@__clc__group_scratch = internal addrspace(3) global [128 x i64] undef, align 1

define i8 addrspace(3)* @__clc__get_group_scratch_bool() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to i8 addrspace(3)*
ret i8 addrspace(3)* %cast
}

define i8 addrspace(3)* @__clc__get_group_scratch_char() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to i8 addrspace(3)*
ret i8 addrspace(3)* %cast
}

define i16 addrspace(3)* @__clc__get_group_scratch_short() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to i16 addrspace(3)*
ret i16 addrspace(3)* %cast
}

define i32 addrspace(3)* @__clc__get_group_scratch_int() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to i32 addrspace(3)*
ret i32 addrspace(3)* %cast
}

define i64 addrspace(3)* @__clc__get_group_scratch_long() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to i64 addrspace(3)*
ret i64 addrspace(3)* %cast
}

define half addrspace(3)* @__clc__get_group_scratch_half() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to half addrspace(3)*
ret half addrspace(3)* %cast
}

define float addrspace(3)* @__clc__get_group_scratch_float() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to float addrspace(3)*
ret float addrspace(3)* %cast
}

define double addrspace(3)* @__clc__get_group_scratch_double() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [64 x i64], [64 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to double addrspace(3)*
ret double addrspace(3)* %cast
}

%complex_half = type {
half,
half
}

%complex_float = type {
float,
float
}

%complex_double = type {
double,
double
}

define %complex_half addrspace(3)* @__clc__get_group_scratch_complex_half() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to %complex_half addrspace(3)*
ret %complex_half addrspace(3)* %cast
}

define %complex_float addrspace(3)* @__clc__get_group_scratch_complex_float() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to %complex_float addrspace(3)*
ret %complex_float addrspace(3)* %cast
}

define %complex_double addrspace(3)* @__clc__get_group_scratch_complex_double() nounwind alwaysinline {
entry:
%ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
%cast = bitcast i64 addrspace(3)* %ptr to %complex_double addrspace(3)*
ret %complex_double addrspace(3)* %cast
}
2 changes: 1 addition & 1 deletion libdevice/cmake/modules/SYCLLibdevice.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ set(complex_obj_deps device_complex.h device.h sycl-compiler)
set(cmath_obj_deps device_math.h device.h sycl-compiler)
set(imf_obj_deps device_imf.hpp imf_half.hpp imf_bf16.hpp device.h sycl-compiler)
set(itt_obj_deps device_itt.h spirv_vars.h device.h sycl-compiler)
set(bfloat16_obj_deps sycl-compiler)
set(bfloat16_obj_deps sycl-headers sycl-compiler)

add_devicelib_obj(libsycl-itt-stubs SRC itt_stubs.cpp DEP ${itt_obj_deps})
add_devicelib_obj(libsycl-itt-compiler-wrappers SRC itt_compiler_wrappers.cpp DEP ${itt_obj_deps})
Expand Down
8 changes: 7 additions & 1 deletion llvm/include/llvm/SYCLLowerIR/SYCLPropagateAspectsUsage.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,19 @@ namespace llvm {
class SYCLPropagateAspectsUsagePass
: public PassInfoMixin<SYCLPropagateAspectsUsagePass> {
public:
SYCLPropagateAspectsUsagePass(StringRef OptionsString = {}) {
SYCLPropagateAspectsUsagePass(std::set<StringRef> ExcludeAspects = {},
bool ValidateAspects = true,
StringRef OptionsString = {})
: ExcludedAspects{std::move(ExcludeAspects)},
ValidateAspectUsage{ValidateAspects} {
OptionsString.split(this->TargetFixedAspects, ',', /*MaxSplit=*/-1,
/*KeepEmpty=*/false);
};
PreservedAnalyses run(Module &M, ModuleAnalysisManager &);

private:
std::set<StringRef> ExcludedAspects;
const bool ValidateAspectUsage;
SmallVector<StringRef, 8> TargetFixedAspects;
};

Expand Down
6 changes: 3 additions & 3 deletions llvm/lib/SYCLLowerIR/CompileTimeProperties.def
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ SYCL_COMPILE_TIME_PROPERTY("sycl-dwidth", 6178, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-latency", 6179, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-read-write-mode", 6180, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-maxburst", 6181, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-alignment", 6182, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-wait-request", 6183, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-stable", 6184, DecorValueTy::boolean)
SYCL_COMPILE_TIME_PROPERTY("sycl-alignment", 44, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-wait-request", 6182, DecorValueTy::uint32)
SYCL_COMPILE_TIME_PROPERTY("sycl-stable", 6183, DecorValueTy::boolean)
SYCL_COMPILE_TIME_PROPERTY("sycl-strict", 19, DecorValueTy::boolean)
7 changes: 6 additions & 1 deletion llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ PreservedAnalyses CompileTimePropertiesPass::run(Module &M,
if (F.getCallingConv() != CallingConv::SPIR_KERNEL)
continue;

// Compile time properties on kernel arguments
{
SmallVector<Metadata *, 8> MDOps;
MDOps.reserve(F.arg_size());
Expand All @@ -345,8 +346,12 @@ PreservedAnalyses CompileTimePropertiesPass::run(Module &M,
SmallVector<Metadata *, 8> MDArgOps;
for (auto &Attribute : F.getAttributes().getParamAttrs(I)) {
if (MDNode *SPIRVMetadata =
attributeToDecorateMetadata(Ctx, Attribute))
attributeToDecorateMetadata(Ctx, Attribute)) {
// sycl-alignment is not collected to SPIRV.ParamDecoration
if (Attribute.getKindAsString() == "sycl-alignment")
continue;
MDArgOps.push_back(SPIRVMetadata);
}
}
if (!MDArgOps.empty())
FoundKernelProperties = true;
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/SYCLLowerIR/ESIMD/LowerESIMD.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,8 @@ class ESIMDIntrinDescTable {
{"test.src.tmpl.arg", {t(0), t1(1), t8(2), t16(3), t32(4), c8(17)}}},
{"slm_init", {"slm.init", {a(0)}}},
{"bf_cvt", {"bf.cvt", {a(0)}}},
{"tf32_cvt", {"tf32.cvt", {a(0)}}}};
{"tf32_cvt", {"tf32.cvt", {a(0)}}},
{"bfn", {"bfn", {a(0), a(1), a(2), t(0)}}}};
}

const IntrinTable &getTable() { return Table; }
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/SYCLLowerIR/LowerKernelProps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ void processSetKernelPropertiesCall(CallInst &CI) {
GraphNode->addFnAttr(::sycl::kernel_props::ATTR_LARGE_GRF);
// Add RegisterAllocMode metadata with arg 2 to the kernel to tell
// IGC to compile this kernel in large GRF mode. 2 means large.
if (GraphNode->getCallingConv() == CallingConv::SPIR_KERNEL) {
if (GraphNode->getCallingConv() == CallingConv::SPIR_KERNEL &&
!GraphNode->hasMetadata("sycl_explicit_simd")) {
auto &Ctx = GraphNode->getContext();
Metadata *AttrMDArgs[] = {ConstantAsMetadata::get(
Constant::getIntegerValue(Type::getInt32Ty(Ctx), APInt(32, 2)))};
Expand Down
61 changes: 54 additions & 7 deletions llvm/lib/SYCLLowerIR/SYCLPropagateAspectsUsage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ static cl::opt<std::string> ClSyclFixedTargets(
"is expected to be runnable on"),
cl::Hidden, cl::init(""));

static cl::opt<std::string> ClSyclExcludeAspects(
"sycl-propagate-aspects-usage-exclude-aspects",
cl::desc("Specify aspects to exclude when propagating aspect usage"),
cl::Hidden, cl::init(""));

namespace {

using AspectsSetTy = SmallSet<int, 4>;
Expand Down Expand Up @@ -293,15 +298,37 @@ getAspectUsageChain(const Function *F, const FunctionToAspectsMapTy &AspectsMap,
return CallChain;
}

void createUsedAspectsMetadataForFunctions(FunctionToAspectsMapTy &Map) {
void createUsedAspectsMetadataForFunctions(
FunctionToAspectsMapTy &Map, const AspectsSetTy &ExcludeAspectVals) {
for (auto &[F, Aspects] : Map) {
if (Aspects.empty())
continue;

LLVMContext &C = F->getContext();

// Create a set of unique aspects. First we add the ones from the found
// aspects that have not been excluded.
AspectsSetTy UniqueAspects;
for (const int &A : Aspects)
if (!ExcludeAspectVals.contains(A))
UniqueAspects.insert(A);

// If there are no new aspects, we can just keep the old metadata.
if (UniqueAspects.empty())
continue;

// If there is new metadata, merge it with the old aspects. We preserve
// the excluded ones.
if (const MDNode *ExistingAspects = F->getMetadata("sycl_used_aspects")) {
for (const MDOperand &MDOp : ExistingAspects->operands()) {
const Constant *C = cast<ConstantAsMetadata>(MDOp)->getValue();
UniqueAspects.insert(cast<ConstantInt>(C)->getSExtValue());
}
}

// Create new metadata.
SmallVector<Metadata *, 16> AspectsMetadata;
for (const auto &A : Aspects)
for (const int &A : UniqueAspects)
AspectsMetadata.push_back(ConstantAsMetadata::get(
ConstantInt::getSigned(Type::getInt32Ty(C), A)));

Expand Down Expand Up @@ -506,7 +533,8 @@ void setSyclFixedTargetsMD(const std::vector<Function *> &EntryPoints,
FunctionToAspectsMapTy
buildFunctionsToAspectsMap(Module &M, TypeToAspectsMapTy &TypesWithAspects,
const AspectValueToNameMapTy &AspectValues,
const std::vector<Function *> &EntryPoints) {
const std::vector<Function *> &EntryPoints,
bool ValidateAspects) {
FunctionToAspectsMapTy FunctionToUsedAspects;
FunctionToAspectsMapTy FunctionToDeclaredAspects;
CallGraphTy CG;
Expand All @@ -522,8 +550,9 @@ buildFunctionsToAspectsMap(Module &M, TypeToAspectsMapTy &TypesWithAspects,
for (Function *F : EntryPoints)
propagateAspectsThroughCG(F, CG, FunctionToUsedAspects, Visited);

validateUsedAspectsForFunctions(FunctionToUsedAspects, AspectValues,
EntryPoints, CG);
if (ValidateAspects)
validateUsedAspectsForFunctions(FunctionToUsedAspects, AspectValues,
EntryPoints, CG);

// The set of aspects from FunctionToDeclaredAspects should be merged to the
// set of FunctionToUsedAspects after validateUsedAspectsForFunctions call to
Expand Down Expand Up @@ -558,6 +587,14 @@ SYCLPropagateAspectsUsagePass::run(Module &M, ModuleAnalysisManager &MAM) {
StringRef(ClSyclFixedTargets)
.split(TargetFixedAspects, ',', /*MaxSplit=*/-1, /*KeepEmpty=*/false);

if (ClSyclExcludeAspects.getNumOccurrences() > 0) {
SmallVector<StringRef, 4> ExcludedAspectsVec;
StringRef(ClSyclExcludeAspects)
.split(ExcludedAspectsVec, ',', /*MaxSplit=*/-1, /*KeepEmpty=*/false);
ExcludedAspects.insert(ExcludedAspectsVec.begin(),
ExcludedAspectsVec.end());
}

std::vector<Function *> EntryPoints;
for (Function &F : M.functions())
if (isEntryPoint(F))
Expand All @@ -566,9 +603,19 @@ SYCLPropagateAspectsUsagePass::run(Module &M, ModuleAnalysisManager &MAM) {
propagateAspectsToOtherTypesInModule(M, TypesWithAspects, AspectValues);

FunctionToAspectsMapTy FunctionToUsedAspects = buildFunctionsToAspectsMap(
M, TypesWithAspects, AspectValues, EntryPoints);
M, TypesWithAspects, AspectValues, EntryPoints, ValidateAspectUsage);

// Create a set of excluded aspect values.
AspectsSetTy ExcludedAspectVals;
for (const StringRef &AspectName : ExcludedAspects) {
const auto AspectValIter = AspectValues.find(AspectName);
assert(AspectValIter != AspectValues.end() &&
"Excluded aspect does not have a corresponding value.");
ExcludedAspectVals.insert(AspectValIter->second);
}

createUsedAspectsMetadataForFunctions(FunctionToUsedAspects);
createUsedAspectsMetadataForFunctions(FunctionToUsedAspects,
ExcludedAspectVals);

setSyclFixedTargetsMD(EntryPoints, TargetFixedAspects, AspectValues);

Expand Down
Loading