Skip to content

[SYCL][New offload model] Add SYCL E2E tests for --offload-new-driver option and fix failing tests #14730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 29, 2024
Merged
3 changes: 2 additions & 1 deletion clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11066,7 +11066,8 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
ArgStringList CmdArgs;

// Pass the CUDA path to the linker wrapper tool.
for (Action::OffloadKind Kind : {Action::OFK_Cuda, Action::OFK_OpenMP}) {
for (Action::OffloadKind Kind :
{Action::OFK_Cuda, Action::OFK_OpenMP, Action::OFK_SYCL}) {
auto TCRange = C.getOffloadToolChains(Kind);
for (auto &I : llvm::make_range(TCRange.first, TCRange.second)) {
const ToolChain *TC = I.second;
Expand Down
8 changes: 5 additions & 3 deletions clang/test/Driver/linker-wrapper-sycl-win.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,11 @@
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}llvm-link.exe" -only-needed [[FIRSTLLVMLINKOUT]].bc {{.*}}.bc -o [[SECONDLLVMLINKOUT:.*]].bc --suppress-warnings
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}sycl-post-link.exe"{{.*}} SYCL_POST_LINK_OPTIONS -o [[SYCLPOSTLINKOUT:.*]].table [[SECONDLLVMLINKOUT]].bc
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}clang.exe"{{.*}} -o [[CLANGOUT:.*]] --target=nvptx64-nvidia-cuda -march={{.*}}
// CHK-CMDS-AOT-NV-NEXT: offload-wrapper: input: {{.*}}, output: [[WRAPPEROUT:.*]].bc
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}ptxas"{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]]
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}fatbinary"{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]]
// CHK-CMDS-AOT-NV-NEXT: offload-wrapper: input: [[FATBINOUT]], output: [[WRAPPEROUT:.*]].bc
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}llc.exe" -filetype=obj -o [[LLCOUT:.*]].o [[WRAPPEROUT]].bc
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT]].o HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o

/// Check for list of commands for standalone clang-linker-wrapper run for sycl (AOT for AMD)
// -------
// Generate .o file as linker wrapper input.
Expand All @@ -107,6 +108,7 @@
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}llvm-link.exe" [[FIRSTLLVMLINKIN:.*]].bc -o [[FIRSTLLVMLINKOUT:.*]].bc --suppress-warnings
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}sycl-post-link.exe"{{.*}} SYCL_POST_LINK_OPTIONS -o [[SYCLPOSTLINKOUT:.*]].table [[FIRSTLLVMLINKOUT]].bc
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}clang.exe"{{.*}} -o [[CLANGOUT:.*]] --target=amdgcn-amd-amdhsa -mcpu={{.*}}
// CHK-CMDS-AOT-AMD-NEXT: offload-wrapper: input: {{.*}}, output: [[WRAPPEROUT:.*]].bc
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}clang-offload-bundler.exe"{{.*}} -input=[[CLANGOUT]] -output=[[BUNDLEROUT:.*]]
// CHK-CMDS-AOT-AMD-NEXT: offload-wrapper: input: [[BUNDLEROUT]], output: [[WRAPPEROUT:.*]].bc
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}llc.exe" -filetype=obj -o [[LLCOUT:.*]].o [[WRAPPEROUT]].bc
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT]].o HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o
14 changes: 10 additions & 4 deletions clang/test/Driver/linker-wrapper-sycl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}llvm-link" -only-needed [[FIRSTLLVMLINKOUT]].bc {{.*}}.bc -o [[SECONDLLVMLINKOUT:.*]].bc --suppress-warnings
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}sycl-post-link"{{.*}} SYCL_POST_LINK_OPTIONS -o [[SYCLPOSTLINKOUT:.*]].table [[SECONDLLVMLINKOUT]].bc
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}clang"{{.*}} -o [[CLANGOUT:.*]] --target=nvptx64-nvidia-cuda -march={{.*}}
// CHK-CMDS-AOT-NV-NEXT: offload-wrapper: input: [[WRAPPERIN:.*]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}ptxas"{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]]
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}fatbinary"{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]]
// CHK-CMDS-AOT-NV-NEXT: offload-wrapper: input: [[FATBINOUT]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT:.*]] [[WRAPPEROUT]]
// CHK-CMDS-AOT-NV-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT]] HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o

Expand All @@ -125,7 +127,8 @@
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}llvm-link" [[FIRSTLLVMLINKIN]].bc -o [[FIRSTLLVMLINKOUT:.*]].bc --suppress-warnings
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}sycl-post-link"{{.*}} SYCL_POST_LINK_OPTIONS -o [[SYCLPOSTLINKOUT:.*]].table [[FIRSTLLVMLINKOUT]].bc
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}clang"{{.*}} -o [[CLANGOUT:.*]] --target=amdgcn-amd-amdhsa -mcpu={{.*}}
// CHK-CMDS-AOT-AMD-NEXT: offload-wrapper: input: [[WRAPPERIN:.*]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}clang-offload-bundler"{{.*}} -targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx803 -input=/dev/null -input=[[CLANGOUT]] -output=[[BUNDLEROUT:.*]]
// CHK-CMDS-AOT-AMD-NEXT: offload-wrapper: input: [[BUNDLEROUT]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT:.*]] [[WRAPPEROUT]]
// CHK-CMDS-AOT-AMD-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT]] HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o

Expand All @@ -150,7 +153,9 @@
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: offload-wrapper: input: {{.*}}.bc, output: [[WRAPPEROUT1:.*]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT1:.*]] [[WRAPPEROUT1]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}clang"{{.*}} -o [[CLANGOUT:.*]] --target=nvptx64-nvidia-cuda -march={{.*}}
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: offload-wrapper: input: [[WRAPPERIN:.*]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}ptxas"{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}fatbinary"{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: offload-wrapper: input: [[FATBINOUT]], output: [[WRAPPEROUT:.*]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT2:.*]] [[WRAPPEROUT]]
// CHK-CMDS-AOT-NV-EMBED-IR-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT1]] [[LLCOUT2]] HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o

Expand All @@ -169,6 +174,7 @@
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: offload-wrapper: input: {{.*}}.bc, output: [[WRAPPEROUT1:.*]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT1:.*]] [[WRAPPEROUT1]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: "{{.*}}clang"{{.*}} -o [[CLANGOUT:.*]] --target=amdgcn-amd-amdhsa -mcpu={{.*}}
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: offload-wrapper: input: [[WRAPPERIN:.*]], output: [[WRAPPEROUT2:.*]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: "{{.*}}clang-offload-bundler"{{.*}} -input=[[CLANGOUT]] -output=[[BUNDLEROUT:.*]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: offload-wrapper: input: [[BUNDLEROUT]], output: [[WRAPPEROUT2:.*]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: "{{.*}}llc" -filetype=obj -o [[LLCOUT2:.*]] [[WRAPPEROUT2]]
// CHK-CMDS-AOT-AMD-EMBED-IR-NEXT: "{{.*}}ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT1]] [[LLCOUT2]] HOST_LIB_PATH HOST_STAT_LIB {{.*}}.o
7 changes: 7 additions & 0 deletions clang/test/Driver/sycl-offload-new-driver.c
Original file line number Diff line number Diff line change
Expand Up @@ -188,3 +188,10 @@
// RUN: -Xsycl-target-backend=spir64_gen "-device pvc,bdw" %s 2>&1 \
// RUN: | FileCheck -check-prefix COMMA_FILE %s
// COMMA_FILE: clang-offload-packager{{.*}} "--image=file={{.*}}pvc@bdw{{.*}},triple=spir64_gen-unknown-unknown,arch=pvc,bdw,kind=sycl"

/// Verify that --cuda-path is passed to clang-linker-wrapper for SYCL offload
// RUN: %clangxx -fsycl -### -fsycl-targets=nvptx64-nvidia-cuda \
// RUN: --cuda-gpu-arch=sm_20 --cuda-path=%S/Inputs/CUDA_80/usr/local/cuda %s \
// RUN: --offload-new-driver 2>&1 \
// RUN: | FileCheck -check-prefix NVPTX_CUDA_PATH %s
// NVPTX_CUDA_PATH: clang-linker-wrapper{{.*}} "--cuda-path={{.*}}Inputs/CUDA_80/usr/local/cuda"
84 changes: 77 additions & 7 deletions clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
//
//===---------------------------------------------------------------------===//

#include "clang/Basic/Cuda.h"
#include "clang/Basic/Version.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/BinaryFormat/Magic.h"
Expand Down Expand Up @@ -409,6 +410,46 @@ fatbinary(ArrayRef<std::pair<StringRef, StringRef>> InputFiles,

return *TempFileOrErr;
}

// ptxas binary
Expected<StringRef> ptxas(StringRef InputFile, const ArgList &Args,
StringRef Arch) {
llvm::TimeTraceScope TimeScope("NVPTX ptxas");
// NVPTX uses the ptxas program to process assembly files.
Expected<std::string> PtxasPath =
findProgram("ptxas", {CudaBinaryPath + "/bin"});
if (!PtxasPath)
return PtxasPath.takeError();

llvm::Triple Triple(
Args.getLastArgValue(OPT_host_triple_EQ, sys::getDefaultTargetTriple()));

// Create a new file to write the output to.
auto TempFileOrErr =
createOutputFile(sys::path::filename(ExecutableName), "cubin");
if (!TempFileOrErr)
return TempFileOrErr.takeError();

SmallVector<StringRef, 16> CmdArgs;
CmdArgs.push_back(*PtxasPath);
CmdArgs.push_back(Triple.isArch64Bit() ? "-m64" : "-m32");
// Pass -v to ptxas if it was passed to the driver.
if (Args.hasArg(OPT_verbose))
CmdArgs.push_back("-v");
StringRef OptLevel = Args.getLastArgValue(OPT_opt_level, "O2");
if (Args.hasArg(OPT_debug))
CmdArgs.push_back("-g");
else
CmdArgs.push_back(Args.MakeArgString("-" + OptLevel));
CmdArgs.push_back("--gpu-name");
CmdArgs.push_back(Arch);
CmdArgs.push_back("--output-file");
CmdArgs.push_back(*TempFileOrErr);
CmdArgs.push_back(InputFile);
if (Error Err = executeCommands(*PtxasPath, CmdArgs))
return std::move(Err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we are returning, do we need to std::move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that std::move has been used wherever we are returning Err. I followed the trend.

return *TempFileOrErr;
}
} // namespace nvptx

namespace amdgcn {
Expand Down Expand Up @@ -1240,7 +1281,8 @@ static Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
} // namespace sycl

namespace generic {
Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args,
bool IsSYCLKind = false) {
llvm::TimeTraceScope TimeScope("Clang");
// Use `clang` to invoke the appropriate device tools.
Expected<std::string> ClangPath =
Expand Down Expand Up @@ -1276,6 +1318,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
if (!Triple.isNVPTX())
CmdArgs.push_back("-Wl,--no-undefined");

if (IsSYCLKind && Triple.isNVPTX())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know why this is SYCL specific and not NVPTX specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I am trying to match the behavior of new offloading model with that found in the new offloading model (for SYCL). This particular flow (getting the assembly file from the clang call and then explicitly calling ptxas inside the linker-wrapper tool) matches with what is found in the old offloading model, but does not match the community flow. Hence the SYCL specificity. This surely warrants revisiting at a later time.

CmdArgs.push_back("-S");
for (StringRef InputFile : InputFiles)
CmdArgs.push_back(InputFile);

Expand Down Expand Up @@ -1369,7 +1413,7 @@ Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
case Triple::ppc64:
case Triple::ppc64le:
case Triple::systemz:
return generic::clang(InputFiles, Args);
return generic::clang(InputFiles, Args, IsSYCLKind);
case Triple::spirv32:
case Triple::spirv64:
case Triple::spir:
Expand Down Expand Up @@ -2078,14 +2122,40 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
return OutputFile.takeError();
WrappedOutput.push_back(*OutputFile);
}

for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
SmallVector<StringRef> Files = {SplitModules[I].ModuleFilePath};
auto LinkedFileFinalOrErr =
StringRef Arch = LinkerArgs.getLastArgValue(OPT_arch_EQ);
if (Arch.empty())
Arch = "native";
SmallVector<std::pair<StringRef, StringRef>, 4> BundlerInputFiles;
auto ClangOutputOrErr =
linkDevice(Files, LinkerArgs, true /* IsSYCLKind */);
if (!LinkedFileFinalOrErr)
return LinkedFileFinalOrErr.takeError();
SplitModules[I].ModuleFilePath = *LinkedFileFinalOrErr;
if (!ClangOutputOrErr)
return ClangOutputOrErr.takeError();
if (Triple.isNVPTX()) {
auto VirtualArch = StringRef(clang::CudaArchToVirtualArchString(
clang::StringToCudaArch(Arch)));
auto PtxasOutputOrErr =
nvptx::ptxas(*ClangOutputOrErr, LinkerArgs, Arch);
if (!PtxasOutputOrErr)
return PtxasOutputOrErr.takeError();
BundlerInputFiles.emplace_back(*ClangOutputOrErr, VirtualArch);
BundlerInputFiles.emplace_back(*PtxasOutputOrErr, Arch);
auto BundledFileOrErr =
nvptx::fatbinary(BundlerInputFiles, LinkerArgs);
if (!BundledFileOrErr)
return BundledFileOrErr.takeError();
SplitModules[I].ModuleFilePath = *BundledFileOrErr;
} else if (Triple.isAMDGCN()) {
BundlerInputFiles.emplace_back(*ClangOutputOrErr, Arch);
auto BundledFileOrErr =
amdgcn::fatbinary(BundlerInputFiles, LinkerArgs);
if (!BundledFileOrErr)
return BundledFileOrErr.takeError();
SplitModules[I].ModuleFilePath = *BundledFileOrErr;
} else {
SplitModules[I].ModuleFilePath = *ClangOutputOrErr;
}
}
// TODO(NOM7): Remove this call and use community flow for bundle/wrap
auto OutputFile = sycl::runWrapperAndCompile(SplitModules, LinkerArgs);
Expand Down
71 changes: 71 additions & 0 deletions sycl/test-e2e/NewOffloadDriver/Inputs/aot.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
//==--- aot.cpp - Simple vector addition (AOT compilation example) --------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===---------------------------------------------------------------------===//

#include <sycl/detail/core.hpp>

#include <array>
#include <iostream>

constexpr sycl::access::mode sycl_read = sycl::access::mode::read;
constexpr sycl::access::mode sycl_write = sycl::access::mode::write;

template <typename T> class Vadd;

template <typename T, size_t N>
void vadd(const std::array<T, N> &A, const std::array<T, N> &B,
std::array<T, N> &C) {
sycl::queue Queue([](sycl::exception_list ExceptionList) {
for (std::exception_ptr ExceptionPtr : ExceptionList) {
try {
std::rethrow_exception(ExceptionPtr);
} catch (sycl::exception &E) {
std::cerr << E.what();
} catch (...) {
std::cerr << "Unknown async exception was caught." << std::endl;
}
}
});

sycl::range<1> numOfItems{N};
sycl::buffer bufA(A.data(), numOfItems);
sycl::buffer bufB(B.data(), numOfItems);
sycl::buffer bufC(C.data(), numOfItems);

Queue.submit([&](sycl::handler &cgh) {
sycl::accessor accA{bufA, cgh, sycl::read_only};
sycl::accessor accB{bufB, cgh, sycl::read_only};
sycl::accessor accC{bufC, cgh, sycl::write_only};

cgh.parallel_for<Vadd<T>>(numOfItems, [=](sycl::id<1> wiID) {
accC[wiID] = accA[wiID] + accB[wiID];
});
});

Queue.wait_and_throw();
}

int main() {
const size_t array_size = 4;
std::array<int, array_size> A = {{1, 2, 3, 4}}, B = {{1, 2, 3, 4}}, C;
std::array<float, array_size> D = {{1.f, 2.f, 3.f, 4.f}},
E = {{1.f, 2.f, 3.f, 4.f}}, F;
vadd(A, B, C);
vadd(D, E, F);
for (unsigned int i = 0; i < array_size; i++) {
if (C[i] != A[i] + B[i]) {
std::cout << "Incorrect result (element " << i << " is " << C[i] << "!\n";
return 1;
}
if (F[i] != D[i] + E[i]) {
std::cout << "Incorrect result (element " << i << " is " << F[i] << "!\n";
return 1;
}
}
std::cout << "Correct result!\n";
return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#include "split-per-source.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a copyright header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was lifted from here: ./sycl/test-e2e/DeviceCodeSplit/Inputs/split-per-source-second-file.cpp where I do not find a copyright header.


void runKernelsFromFile2() {
sycl::queue Q;
int Data = 0;
{
sycl::buffer<int, 1> Buf(&Data, sycl::range<1>(1));
auto KernelID1 = sycl::get_kernel_id<File2Kern1>();
auto KB = sycl::get_kernel_bundle<sycl::bundle_state::executable>(
Q.get_context(), {KernelID1});
auto Krn = KB.get_kernel(KernelID1);

std::vector<sycl::kernel_id> KernelIDStorage = KB.get_kernel_ids();
assert(KernelIDStorage.size() == 1);
assert(KernelIDStorage[0] == KernelID1);

Q.submit([&](sycl::handler &Cgh) {
auto Acc = Buf.get_access<sycl::access::mode::read_write>(Cgh);
Cgh.single_task<File2Kern1>(Krn, [=]() { Acc[0] = 3; });
});
}
assert(Data == 3);
}
7 changes: 7 additions & 0 deletions sycl/test-e2e/NewOffloadDriver/Inputs/split-per-source.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#include <sycl/detail/core.hpp>

class File1Kern1;
class File1Kern2;
class File2Kern1;

void runKernelsFromFile2();
6 changes: 6 additions & 0 deletions sycl/test-e2e/NewOffloadDriver/aot-cpu.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// REQUIRES: opencl-aot, cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need any-device-is-cpu i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of this setting. But I can add that here. Again this test was lifted from ./sycl/test-e2e/DeviceCodeSplit/aot-cpu.cpp and I added --offload-new-driver


// Test with `--offload-new-driver`
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_source -fsycl-targets=spir64_x86_64 -I %S/Inputs -o %t.out %S/split-per-source-main.cpp %S/Inputs/split-per-source-second-file.cpp \
// RUN: -fsycl-dead-args-optimization --offload-new-driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-dead-args-optimization seems to be enabled by default, so do we really need to pass it everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have lifted this file again from ./sycl/test-e2e/DeviceCodeSplit/aot-cpu.cpp. May be we can have a cleanup PR later on?

// RUN: %{run} %t.out
13 changes: 13 additions & 0 deletions sycl/test-e2e/NewOffloadDriver/aot-gpu.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// REQUIRES: ocloc, gpu
// UNSUPPORTED: cuda || hip
// CUDA does neither support device code splitting nor SPIR.
// Test with `--offload-new-driver`
//
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_source \
// RUN: -fsycl-targets=spir64_gen \
// RUN: -Xsycl-target-backend=spir64_gen \
// RUN: "-device tgllp" -I %S/Inputs -o %t.out \
// RUN: %S/split-per-source-main.cpp \
// RUN: %S/Inputs/split-per-source-second-file.cpp \
// RUN: -fsycl-dead-args-optimization --offload-new-driver
// RUN: %{run} %t.out
20 changes: 20 additions & 0 deletions sycl/test-e2e/NewOffloadDriver/cpu.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
//==--- cpu.cpp - AOT compilation for cpu devices using opencl-aot --------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===---------------------------------------------------------------------===//

// REQUIRES: opencl-aot, cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need any-device-is-cpu here too i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comments. Thanks


// Test with `--offload-new-driver`
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --offload-new-driver %S/Inputs/aot.cpp -o %t.out
// RUN: %{run} %t.out

// Test that opencl-aot can handle multiple build options.
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --offload-new-driver %S/Inputs/aot.cpp -Xsycl-target-backend "--bo=-g" -Xsycl-target-backend "--bo=-cl-opt-disable" -o %t2.out

// Test that opencl-aot can handle march option.
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --offload-new-driver %S/Inputs/aot.cpp -Xsycl-target-backend "--march=avx512"
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --offload-new-driver %S/Inputs/aot.cpp -Xsycl-target-backend "--march=wsm"
Loading
Loading