Skip to content

[SYCL] Static linking support in Level Zero plugin #5216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Dec 23, 2021

The Level Zero driver now has "static linking" support, the ability to
create a single Level Zero module from a list of SPIR-V modules. Use
this feature to implement piProgramLink, replacing the previous
implementation that relied on "dynamic linking" of Level Zero modules.
There are two main improvements:

  • Static linking provides more optimizations because the compiler sees
    the SPIR-V for all modules, which enables cross-module optimizations.

  • The previous implementation was mostly using mock driver APIs, and it
    didn't work in the general case when piProgramLink tried to link
    more than one input module together.

This is mostly an infrastructure improvement, which does not provide
any new features or bug fixes. Since there is no Level Zero interop
API that can create a kernel_bundle in object or input state,
there is no need currently to link multiple Level Zero modules together
with piProgramLink.

However, this commit provides the infrastructure we need to enable
online linking of the "device library" into device code. That feature
will be enabled in a future commit.

Since this change required a large restructure to the pi_program
handling in the Level Zero plugin, I also made a few other
improvements:

  • An error is now diagnosed from piProgramBuild and piProgramLink
    if the built program has any unresolved symbols. Moreover, the build
    log includes a list of the unresolved symbols, and this log is
    incorporated into the what string of the exception that the runtime
    throws. As a result, the user will see a list of the unresolved
    symbols when the exception terminates the application.

    Previously, no error was diagnosed from piProgramBuild or
    piProgramLink when there were unresolved symbols, but the first
    call to piKernelCreate would fail (without any indication about the
    cause of the failure).

  • The piProgramLink function now diagnoses an error if there are any
    command line options passed to the online linker. Previously, any
    such options were silently ignored. The Level Zero linker does not
    support any way to pass options to the linker currently. (It does
    support options for the compiler, and the plugin does support these.)

  • The piProgramGetBuildInfo function now returns the program's
    compilation or build options for the CL_PROGRAM_BUILD_OPTIONS query.

  • Locking has been improved for Level Zero operations on pi_program
    objects. Multiple threads may now access the same pi_program
    without danger of getting corrupt data.

The Level Zero driver now has "static linking" support, the ability to
create a single Level Zero module from a list of SPIR-V modules.  Use
this feature to implement `piProgramLink`, replacing the previous
implementation that relied on "dynamic linking" of Level Zero modules.
There are two main improvements:

* Static linking provides more optimizations because the compiler sees
  the SPIR-V for all modules, which enables cross-module optimizations.

* The previous implementation was mostly using mock driver APIs, and it
  didn't work in the general case when `piProgramLink` tried to link
  more than one input module together.

This is mostly an infrastructure improvement, which does not provide
any new features or bug fixes.  Since there is no Level Zero interop
API that can create a `kernel_bundle` in `object` or `input` state,
there is no need currently to link multiple Level Zero modules together
with `piProgramLink`.

However, this commit provides the infrastructure we need to enable
online linking of the "device library" into device code.  That feature
will be enabled in a future commit.

Since this change required a large restructure to the `pi_program`
handling in the Level Zero plugin, I also made a few other
improvements:

* An error is now diagnosed from `piProgramBuild` and `piProgramLink`
  if the built program has any unresolved symbols.  Moreover, the build
  log includes a list of the unresolved symbols, and this log is
  incorporated into the `what` string of the exception that the runtime
  throws.  As a result, the user will see a list of the unresolved
  symbols when the exception terminates the application.

  Previously, no error was diagnosed from `piProgramBuild` or
  `piProgramLink` when there were unresolved symbols, but the first
  call to `piKernelCreate` would fail (without any indication about the
  cause of the failure).

* The `piProgramLink` function now diagnoses an error if there are any
  command line options passed to the online linker.  Previously, any
  such options were silently ignored.  The Level Zero linker does not
  support any way to pass options to the linker currently.  (It does
  support options for the compiler, and the plugin does support these.)

* The `piProgramGetBuildInfo` function now returns the program's
  compilation or build options for the `CL_PROGRAM_BUILD_OPTIONS` query.

* Locking has been improved for Level Zero operations on `pi_program`
  objects.  Multiple threads may now access the same `pi_program`
  without danger of getting corrupt data.
The previous commit adds new logic to the Level Zero implementation of
`piProgramLink`, but this logic is only used currently if the SYCL
application explicitly calls `sycl::link`.  When an application merely
executes a kernel, the SYCL runtime generally calls `piProgramBuild`
instead of `piProgramCompile` / `piProgramLink`.

In order to enable wider testing of the new `piProgramLink` logic, this
commit adds a temporary environment variable which forces the SYCL
runtime to build all programs by calling `piProgramCompile` /
`piProgramLink`.  This is usually safe, though there are a few cases
where it is unsafe and results in false errors.  Therefore, use this
environment variable with caution.

Usage is like:

```
SYCL_FORCE_LINK=[0|1]
```

When running tests with `llvm-lit`, you can set this environment
variable by passing `--param extra_environment=SYCL_FORCE_LINK=1`:

```
$ llvm-lit --param extra_environment=SYCL_FORCE_LINK=1  \
  --param sycl_be=level_zero --param target_devices=gpu \
  -j12 -v -s <test>
```

Here are the known cases when specifying `SYCL_FORCE_LINK=1` results in
a false error:

* Tests using AOT (ahead-of-time compilation) generally fail because
  a `pi_program` that is created from native code cannot be compiled
  with `piProgramCompile`.

* Some tests enable tracing with `SYCL_PI_TRACE` and explicitly check
  for a call to `piProgramBuild`.  These tests fail because the runtime
  calls `piProgramCompile` / `piProgramLink` instead of calling
  `piProgramBuild`.
Comment on lines -1022 to -1032
// Module linking in Level Zero is quite different from program linking in
// OpenCL. OpenCL statically links several program objects together to
// form a new program that contains the linked result. Level Zero is more
// similar to shared libraries. When several Level Zero modules are linked
// together, each module is modified "in place" such that external
// references from one are linked to external definitions in another.
// Linking in Level Zero does not produce a new Level Zero module that
// represents the linked result, therefore a program in LinkedExe state
// holds a list of all the pi_programs that were linked together. Queries
// about the linked program need to query all the pi_programs in this list.
LinkedExe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we retain this capability (dynamic linking) in the plugin in case any future uses of it arise?

@smaslov-intel
Copy link
Contributor

  • Locking has been improved for Level Zero operations on pi_program
    objects. Multiple threads may now access the same pi_program
    without danger of getting corrupt data.

Can this be at least separated from this PR?
@againull is working on a general thread-safety improvement, which should cover it, I expect.

@smaslov-intel
Copy link
Contributor

A general ask would be to break this into multiple different change-sets.

@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 6, 2022

@bader
Copy link
Contributor

bader commented Jan 14, 2022

@gmlueck, can we close this PR?

@gmlueck gmlueck closed this Jan 14, 2022
@gmlueck gmlueck deleted the gmlueck/l0-static-linking branch January 18, 2022 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants