-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Static linking support in Level Zero plugin #5216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Level Zero driver now has "static linking" support, the ability to create a single Level Zero module from a list of SPIR-V modules. Use this feature to implement `piProgramLink`, replacing the previous implementation that relied on "dynamic linking" of Level Zero modules. There are two main improvements: * Static linking provides more optimizations because the compiler sees the SPIR-V for all modules, which enables cross-module optimizations. * The previous implementation was mostly using mock driver APIs, and it didn't work in the general case when `piProgramLink` tried to link more than one input module together. This is mostly an infrastructure improvement, which does not provide any new features or bug fixes. Since there is no Level Zero interop API that can create a `kernel_bundle` in `object` or `input` state, there is no need currently to link multiple Level Zero modules together with `piProgramLink`. However, this commit provides the infrastructure we need to enable online linking of the "device library" into device code. That feature will be enabled in a future commit. Since this change required a large restructure to the `pi_program` handling in the Level Zero plugin, I also made a few other improvements: * An error is now diagnosed from `piProgramBuild` and `piProgramLink` if the built program has any unresolved symbols. Moreover, the build log includes a list of the unresolved symbols, and this log is incorporated into the `what` string of the exception that the runtime throws. As a result, the user will see a list of the unresolved symbols when the exception terminates the application. Previously, no error was diagnosed from `piProgramBuild` or `piProgramLink` when there were unresolved symbols, but the first call to `piKernelCreate` would fail (without any indication about the cause of the failure). * The `piProgramLink` function now diagnoses an error if there are any command line options passed to the online linker. Previously, any such options were silently ignored. The Level Zero linker does not support any way to pass options to the linker currently. (It does support options for the compiler, and the plugin does support these.) * The `piProgramGetBuildInfo` function now returns the program's compilation or build options for the `CL_PROGRAM_BUILD_OPTIONS` query. * Locking has been improved for Level Zero operations on `pi_program` objects. Multiple threads may now access the same `pi_program` without danger of getting corrupt data.
The previous commit adds new logic to the Level Zero implementation of `piProgramLink`, but this logic is only used currently if the SYCL application explicitly calls `sycl::link`. When an application merely executes a kernel, the SYCL runtime generally calls `piProgramBuild` instead of `piProgramCompile` / `piProgramLink`. In order to enable wider testing of the new `piProgramLink` logic, this commit adds a temporary environment variable which forces the SYCL runtime to build all programs by calling `piProgramCompile` / `piProgramLink`. This is usually safe, though there are a few cases where it is unsafe and results in false errors. Therefore, use this environment variable with caution. Usage is like: ``` SYCL_FORCE_LINK=[0|1] ``` When running tests with `llvm-lit`, you can set this environment variable by passing `--param extra_environment=SYCL_FORCE_LINK=1`: ``` $ llvm-lit --param extra_environment=SYCL_FORCE_LINK=1 \ --param sycl_be=level_zero --param target_devices=gpu \ -j12 -v -s <test> ``` Here are the known cases when specifying `SYCL_FORCE_LINK=1` results in a false error: * Tests using AOT (ahead-of-time compilation) generally fail because a `pi_program` that is created from native code cannot be compiled with `piProgramCompile`. * Some tests enable tracing with `SYCL_PI_TRACE` and explicitly check for a call to `piProgramBuild`. These tests fail because the runtime calls `piProgramCompile` / `piProgramLink` instead of calling `piProgramBuild`.
// Module linking in Level Zero is quite different from program linking in | ||
// OpenCL. OpenCL statically links several program objects together to | ||
// form a new program that contains the linked result. Level Zero is more | ||
// similar to shared libraries. When several Level Zero modules are linked | ||
// together, each module is modified "in place" such that external | ||
// references from one are linked to external definitions in another. | ||
// Linking in Level Zero does not produce a new Level Zero module that | ||
// represents the linked result, therefore a program in LinkedExe state | ||
// holds a list of all the pi_programs that were linked together. Queries | ||
// about the linked program need to query all the pi_programs in this list. | ||
LinkedExe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we retain this capability (dynamic linking) in the plugin in case any future uses of it arise?
Can this be at least separated from this PR? |
A general ask would be to break this into multiple different change-sets. |
I have split this PR up into a series of three PRs: |
@gmlueck, can we close this PR? |
The Level Zero driver now has "static linking" support, the ability to
create a single Level Zero module from a list of SPIR-V modules. Use
this feature to implement
piProgramLink
, replacing the previousimplementation that relied on "dynamic linking" of Level Zero modules.
There are two main improvements:
Static linking provides more optimizations because the compiler sees
the SPIR-V for all modules, which enables cross-module optimizations.
The previous implementation was mostly using mock driver APIs, and it
didn't work in the general case when
piProgramLink
tried to linkmore than one input module together.
This is mostly an infrastructure improvement, which does not provide
any new features or bug fixes. Since there is no Level Zero interop
API that can create a
kernel_bundle
inobject
orinput
state,there is no need currently to link multiple Level Zero modules together
with
piProgramLink
.However, this commit provides the infrastructure we need to enable
online linking of the "device library" into device code. That feature
will be enabled in a future commit.
Since this change required a large restructure to the
pi_program
handling in the Level Zero plugin, I also made a few other
improvements:
An error is now diagnosed from
piProgramBuild
andpiProgramLink
if the built program has any unresolved symbols. Moreover, the build
log includes a list of the unresolved symbols, and this log is
incorporated into the
what
string of the exception that the runtimethrows. As a result, the user will see a list of the unresolved
symbols when the exception terminates the application.
Previously, no error was diagnosed from
piProgramBuild
orpiProgramLink
when there were unresolved symbols, but the firstcall to
piKernelCreate
would fail (without any indication about thecause of the failure).
The
piProgramLink
function now diagnoses an error if there are anycommand line options passed to the online linker. Previously, any
such options were silently ignored. The Level Zero linker does not
support any way to pass options to the linker currently. (It does
support options for the compiler, and the plugin does support these.)
The
piProgramGetBuildInfo
function now returns the program'scompilation or build options for the
CL_PROGRAM_BUILD_OPTIONS
query.Locking has been improved for Level Zero operations on
pi_program
objects. Multiple threads may now access the same
pi_program
without danger of getting corrupt data.