Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] [NATIVECPU] Integrate OneAPI Construction Kit vectorizer #12659

Merged
merged 59 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
a623d24
Support barriers on Native CPU
PietroGhg Oct 20, 2023
aed012f
formatting
PietroGhg Oct 20, 2023
798e2d1
formatting
PietroGhg Oct 20, 2023
4233cf8
Remove def for fixUpKernelNameAfterBarrier
PietroGhg Oct 20, 2023
5d76bd6
Licence header
PietroGhg Oct 23, 2023
509447c
Update lit test
PietroGhg Oct 23, 2023
9586b39
formatting
PietroGhg Oct 23, 2023
c297e24
formatting
PietroGhg Oct 23, 2023
de9506d
Merge branch 'sycl' into pietro/barriers
PietroGhg Oct 25, 2023
5a62505
formatting
PietroGhg Oct 25, 2023
0b140ef
[wip] vecz integration
PietroGhg Oct 31, 2023
c5697e7
Better defaults
PietroGhg Oct 31, 2023
af3abd1
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 1, 2023
986e37a
Consistent naming for cmake var
PietroGhg Nov 1, 2023
5188f8c
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 2, 2023
85bcbc6
Merge branch 'pietro/barriers' into pietro/vecz
PietroGhg Nov 6, 2023
f07433f
Enable vectorization by default
PietroGhg Nov 7, 2023
9f4a6b9
formatting
PietroGhg Nov 7, 2023
277f6a9
Merge branch 'sycl' into pietro/vecz
PietroGhg Nov 8, 2023
f2b634e
Remove debug print
PietroGhg Nov 8, 2023
b744997
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 9, 2023
6529479
Test updated OCK branch
PietroGhg Nov 9, 2023
9023de6
Restore real ock tag
PietroGhg Nov 9, 2023
06634f3
Merge branch 'pietro/barriers' of github.com:PietroGhg/llvm into piet…
PietroGhg Nov 9, 2023
8e3b25d
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 15, 2023
989da9c
Merge branch 'pietro/barriers' into pietro/vecz
PietroGhg Nov 15, 2023
8fc2392
Move utily functions to UtilsSYCLNativeCPU.h
PietroGhg Nov 16, 2023
11de7b3
Consistent naming in docs
PietroGhg Nov 16, 2023
376556d
change fixCallingConv name
PietroGhg Nov 17, 2023
93690b6
Check after dyn_cast
PietroGhg Nov 17, 2023
ab3e154
remove libclc-relatd cmake from native cpu cmake
PietroGhg Nov 17, 2023
ef1e920
Put back O2 in lit test
PietroGhg Nov 17, 2023
e8d7e3b
Update docs
PietroGhg Nov 17, 2023
e638b88
check after dyn cast
PietroGhg Nov 17, 2023
49ed3e7
Use llvm::cast where appropriate
PietroGhg Nov 20, 2023
8f6eab3
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 20, 2023
27f5177
Merge branch 'sycl' into pietro/barriers
PietroGhg Nov 28, 2023
27936cd
typo
PietroGhg Nov 28, 2023
36189e2
Merge branch 'pietro/barriers' into pietro/vecz
PietroGhg Nov 28, 2023
91c9b77
use llvm option for disabling vecz
PietroGhg Dec 1, 2023
a67fa2b
Merge branch 'sycl' into pietro/vecz
PietroGhg Dec 6, 2023
26bff15
Merge branch 'sycl' into pietro/vecz
PietroGhg Jan 31, 2024
b1f3fcd
Update OCK tag
PietroGhg Jan 31, 2024
db4612e
Formatting
PietroGhg Jan 31, 2024
69c83b6
Change vecz width option name and location
PietroGhg Feb 2, 2024
4abfb0e
Update docs
PietroGhg Feb 2, 2024
9ae8a21
Update docs
PietroGhg Feb 2, 2024
05ae105
New line in docs
PietroGhg Feb 2, 2024
432681e
Link to section
PietroGhg Feb 2, 2024
fc6e1d3
Link to section
PietroGhg Feb 2, 2024
50ea60c
Merge branch 'sycl' into pietro/vecz
PietroGhg Feb 5, 2024
94d849c
Update lit tests
PietroGhg Feb 5, 2024
263d58f
Use llvm::OptimiaztionLevel
PietroGhg Feb 6, 2024
cf584d2
Updated vector add test
PietroGhg Feb 6, 2024
17b541f
Merge branch 'sycl' into pietro/vecz
PietroGhg Feb 7, 2024
9ad5e36
Update docs
PietroGhg Feb 7, 2024
7349939
Formatting
PietroGhg Feb 8, 2024
5957645
Mark vectorization test are require native_cpu_be
PietroGhg Feb 8, 2024
655afa6
Remove unnecessary include
PietroGhg Feb 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'sycl' into pietro/vecz
  • Loading branch information
PietroGhg committed Jan 31, 2024
commit 26bff15dca442e373c3d56fd3cdee56f51d5fb36
17 changes: 8 additions & 9 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/HipStdPar/HipStdPar.h"
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/HipStdPar/HipStdPar.h"
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
#include "llvm/Transforms/IPO/LowerTypeTests.h"
#include "llvm/Transforms/IPO/ThinLTOBitcodeWriter.h"
Expand Down Expand Up @@ -117,21 +115,22 @@ static cl::opt<bool> ClSanitizeOnOptimizerEarlyEP(
"sanitizer-early-opt-ep", cl::Optional,
cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false));

extern cl::opt<InstrProfCorrelator::ProfCorrelatorKind> ProfileCorrelate;

// Re-link builtin bitcodes after optimization
cl::opt<bool> ClRelinkBuiltinBitcodePostop(
"relink-builtin-bitcode-postop", cl::Optional,
cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));

static cl::opt<bool> SYCLNativeCPUBackend(
"sycl-native-cpu-backend", cl::init(false),
cl::desc("Run the backend passes for SYCL Native CPU"));
} // namespace llvm

static cl::opt<bool> SYCLNativeCPUNoVecz(
"sycl-native-cpu-no-vecz", cl::init(false),
cl::desc("Disable vectorizer for SYCL Native CPU"));

// Re-link builtin bitcodes after optimization
cl::opt<bool> ClRelinkBuiltinBitcodePostop(
"relink-builtin-bitcode-postop", cl::Optional,
cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));

}

namespace {

// Default filename used for profile generation.
Expand Down
2 changes: 0 additions & 2 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5608,8 +5608,6 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (IsSYCLOffloadDevice && IsSYCLNativeCPU) {
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("-sycl-native-cpu-backend");
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("-inline-threshold=10000");
}

// Also ignore explicit -force_cpusubtype_ALL option.
Expand Down
4 changes: 2 additions & 2 deletions llvm/include/llvm/SYCLLowerIR/UtilsSYCLNativeCPU.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
//
//===----------------------------------------------------------------------===//
#pragma once
#include "llvm/IR/PassManager.h"
#include "llvm/ADT/Twine.h"
#include "llvm/IR/PassManager.h"

namespace llvm {
namespace sycl {
Expand Down Expand Up @@ -40,7 +40,7 @@ constexpr char SYCLNATIVECPUSUFFIX[] = ".SYCLNCPU";
constexpr char SYCLNATIVECPUKERNEL[] = ".NativeCPUKernel";
constexpr char SYCLNATIVECPUPREFIX[] = "__dpcpp_nativecpu";
inline llvm::Twine addSYCLNativeCPUSuffix(StringRef S) {
if (S.startswith(SYCLNATIVECPUPREFIX) || S.endswith(SYCLNATIVECPUKERNEL))
if (S.starts_with(SYCLNATIVECPUPREFIX) || S.ends_with(SYCLNATIVECPUKERNEL))
return S;
return llvm::Twine(S, SYCLNATIVECPUSUFFIX);
}
Expand Down
8 changes: 4 additions & 4 deletions llvm/lib/SYCLLowerIR/ConvertToMuxBuiltinsSYCLNativeCPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ static constexpr char MuxBarrier[] = "__mux_work_group_barrier";

Function *getReplaceFunc(Module &M, StringRef Name) {
LLVMContext &Ctx = M.getContext();
static auto *MuxFTy =
auto *MuxFTy =
FunctionType::get(Type::getInt64Ty(Ctx), {Type::getInt32Ty(Ctx)}, false);
auto F = M.getOrInsertFunction(Name, MuxFTy);
return cast<Function>(F.getCallee());
Expand All @@ -93,11 +93,11 @@ Function *getMuxBarrierFunc(Module &M) {
// void __mux_work_group_barrier(i32 %id, i32 %scope, i32 %semantics)
LLVMContext &Ctx = M.getContext();
auto *Int32Ty = Type::getInt32Ty(Ctx);
static auto *MuxFTy = FunctionType::get(Type::getVoidTy(Ctx),
{Int32Ty, Int32Ty, Int32Ty}, false);
auto *MuxFTy = FunctionType::get(Type::getVoidTy(Ctx),
{Int32Ty, Int32Ty, Int32Ty}, false);
auto FCallee = M.getOrInsertFunction(MuxBarrier, MuxFTy);
auto *F = dyn_cast<Function>(FCallee.getCallee());
if(!F) {
if (!F) {
report_fatal_error("Error while inserting mux builtins");
}
return F;
Expand Down
7 changes: 6 additions & 1 deletion llvm/lib/SYCLLowerIR/PipelineSYCLNativeCPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,22 @@
#include "compiler/utils/work_item_loops_pass.h"
#include "vecz/pass.h"
#include "vecz/vecz_target_info.h"
#include "compiler/utils/prepare_barriers_pass.h"
#include "compiler/utils/sub_group_analysis.h"
#include "compiler/utils/work_item_loops_pass.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"
#endif

using namespace llvm;
using namespace sycl::utils;

cl::opt<bool> ForceNoTail("native-cpu-force-no-tail", cl::init(false),
cl::desc("Never emit the peeling loop for vectorized kernels,"
"even when the local size is not known to be a multiple of the vector width"));

cl::opt<bool> IsDebug("native-cpu-debug", cl::init(false),
cl::desc("Emit extra alloca instructions to preserve the value of live"
"vriables between barriers"));
"variables between barriers"));
cl::opt<unsigned> NativeCPUVeczWidth("ncpu-vecz-width", cl::init(8), cl::desc("Vector width for SYCL Native CPU vectorizer, defaults to 8"));
void llvm::sycl::utils::addSYCLNativeCPUBackendPasses(llvm::ModulePassManager &MPM,
ModuleAnalysisManager &MAM, unsigned OptLevel, bool DisableVecz) {
Expand Down Expand Up @@ -65,6 +69,7 @@ void llvm::sycl::utils::addSYCLNativeCPUBackendPasses(llvm::ModulePassManager &M
Opts.ForceNoTail = ForceNoTail;
MAM.registerPass([&] { return compiler::utils::BuiltinInfoAnalysis(); });
MAM.registerPass([&] { return compiler::utils::SubgroupAnalysis(); });
MPM.addPass(compiler::utils::PrepareBarriersPass());
MPM.addPass(compiler::utils::WorkItemLoopsPass(Opts));
MPM.addPass(AlwaysInlinerPass());
#endif
Expand Down
29 changes: 9 additions & 20 deletions llvm/lib/SYCLLowerIR/PrepareSYCLNativeCPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
//===----------------------------------------------------------------------===//

#include "llvm/SYCLLowerIR/PrepareSYCLNativeCPU.h"
#include "llvm/BinaryFormat/MsgPack.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"
#include "llvm/IR/DebugInfoMetadata.h"
Expand All @@ -30,22 +29,13 @@
#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Operator.h"
#include "llvm/IR/Value.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
#include "llvm/SYCLLowerIR/UtilsSYCLNativeCPU.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/ValueMapper.h"
#include <functional>
#include <numeric>
#include <set>
#include <utility>
#include <vector>

Expand All @@ -61,9 +51,7 @@ using namespace sycl::utils;

namespace {

void fixCallingConv(Function *F) {
F->setCallingConv(llvm::CallingConv::C);
}
void fixCallingConv(Function *F) { F->setCallingConv(llvm::CallingConv::C); }

void emitSubkernelForKernel(Function *F, Type *NativeCPUArgDescType,
Type *StatePtrType, llvm::Constant *StateArgTLS) {
Expand Down Expand Up @@ -225,8 +213,7 @@ static Function *addSetLocalIdFunc(Module &M, StringRef Name, Type *StateType) {
Type *DimTy = I32Ty;
Type *ValTy = I64Ty;
Type *PtrTy = PointerType::get(Ctx, NativeCPUGlobalAS);
static FunctionType *FTy =
FunctionType::get(RetTy, {DimTy, ValTy, PtrTy}, false);
FunctionType *FTy = FunctionType::get(RetTy, {DimTy, ValTy, PtrTy}, false);
auto FCallee = M.getOrInsertFunction(Name, FTy);
auto *F = cast<Function>(FCallee.getCallee());
IRBuilder<> Builder(Ctx);
Expand Down Expand Up @@ -308,6 +295,7 @@ static Function *addReplaceFunc(Module &M, StringRef Name, Type *StateType) {
Builder.CreateRetVoid();
Res = F;
}
Res->setLinkage(GlobalValue::LinkageTypes::InternalLinkage);
return Res;
}

Expand Down Expand Up @@ -459,10 +447,11 @@ PreservedAnalyses PrepareSYCLNativeCPUPass::run(Module &M,
// CallInstructions in it have debug info, otherwise we end up with
// invalid IR after inlining.
if (I->getFunction()->hasMetadata("dbg")) {
I->setDebugLoc(DILocation::get(M.getContext(), 0, 0,
I->getFunction()->getSubprogram()));
if (I->getMetadata("dbg"))
NewI->setDebugLoc(I->getDebugLoc());
if (!I->getMetadata("dbg")) {
I->setDebugLoc(DILocation::get(M.getContext(), 0, 0,
I->getFunction()->getSubprogram()));
}
NewI->setDebugLoc(I->getDebugLoc());
}
ToRemove.push_back(std::make_pair(I, NewI));
}
Expand Down Expand Up @@ -495,7 +484,7 @@ PreservedAnalyses PrepareSYCLNativeCPUPass::run(Module &M,
// function will not be executed (since it has been inlined) and so we can
// just define __mux_work_group_barrier as a no-op to avoid linker errors.
// Todo: currently we can't remove the function here even if it has no uses,
// because we may still emit a declaration for in the offload-wrapper.
// because we may still emit a declaration for it in the offload-wrapper.
auto BarrierF =
M.getFunction(compiler::utils::MuxBuiltins::work_group_barrier);
if (BarrierF && BarrierF->isDeclaration()) {
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/SYCLNativeCPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ SYCL Native CPU uses the [oneAPI Construction Kit](https://github.com/codeplayso
```
python3 buildbot/configure.py \
--native_cpu \
--cmake-opt=-DNATIVE_CPU_USE_OCK=Off
--cmake-opt=-DNATIVECPU_USE_OCK=Off
```

The SYCL Native CPU device needs to be selected at runtime by setting the environment variable `ONEAPI_DEVICE_SELECTOR=native_cpu:cpu`.
Expand Down
2 changes: 1 addition & 1 deletion sycl/plugins/native_cpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if(NATIVECPU_USE_OCK)
${oneapi-ck_SOURCE_DIR}/modules/cargo/include
${oneapi-ck_SOURCE_DIR}/modules/compiler/vecz/include
${oneapi-ck_SOURCE_DIR}/modules/compiler/utils/include)
target_link_libraries(LLVMSYCLLowerIR PRIVATE compiler-utils vecz)
target_link_libraries(LLVMSYCLLowerIR PRIVATE compiler-pipeline vecz)
target_compile_definitions(pi_native_cpu PRIVATE NATIVECPU_USE_OCK)

endif()
Expand Down
16 changes: 8 additions & 8 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,14 @@ endif()
if(SYCL_PI_UR_USE_FETCH_CONTENT)
include(FetchContent)

set(UNIFIED_RUNTIME_REPO "https://github.com/PietroGhg/unified-runtime.git")
# commit 659d3f469faa99a886fa680a3d6d20449b109578
# Merge: 192e9404 f94550b4
set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime")
# commit 73d85ef9f48ec4d1f213066a31ed7e4402b5499b
# Merge: e46dc359 7985d3ee
# Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
# Date: Tue Nov 14 16:45:24 2023 +0000
# Merge pull request #1059 from martygrant/martin/moveNativeCPUAdapterToUR
# [NATIVECPU] Move Native CPU adapter to UR.
set(UNIFIED_RUNTIME_TAG pietro/barriers)
# Date: Mon Jan 29 14:26:08 2024 +0000
# Merge pull request #1289 from nrspruit/fix_multiDevice
# [L0] Fix native kernel usage, multi device kernel pointer and WorkSize
set(UNIFIED_RUNTIME_TAG 73d85ef9f48ec4d1f213066a31ed7e4402b5499b)

if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")
Expand Down Expand Up @@ -207,7 +207,7 @@ endif()
if ("native_cpu" IN_LIST SYCL_ENABLE_PLUGINS)
add_dependencies(sycl-runtime-libraries ur_adapter_native_cpu)

option(NATIVECPU_USE_OCK "Use the oneAPI Construction Kit for Native CPU" On)
option(NATIVECPU_USE_OCK "Use the oneAPI Construction Kit for Native CPU" ON)

if(NATIVECPU_USE_OCK)
message(STATUS "Compiling Native CPU adapter with OCK support.")
Expand Down
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.