Skip to content

[SYCL] Introduce the Level Zero plugin #1718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jun 8, 2020
Merged

[SYCL] Introduce the Level Zero plugin #1718

merged 29 commits into from
Jun 8, 2020

Conversation

againull
Copy link
Contributor

Plugin itself consists of the header and the source file plus cmake
file to build the plugin. Also the following changes were made to
suport the Level Zero plugin in SYCL RT:
* New level0 value was added to the backend enum
* New PI_LEVEL0 value support was added to the SYCL_BE config.
* Docs were updated. Mentioned Level Zero backend and provided the
link to the Level Zero runtime for Intel GPU.
* Changes in sycl cmake file to build Level Zero plugin by default and to
install it with sycl toolchain.

LIT testing with PI_LEVEL0 backend will be enabled in the following
commits. This commits introduces the plugin and makes it buildable.

Signed-off-by: Artur Gainullin artur.gainullin@intel.com

againull added 2 commits May 19, 2020 23:39
Plugin itself consists of the header and the source file plus cmake
file to build the plugin. Also the following changes were made to
suport the Level Zero plugin in SYCL RT:
* New level0 value was added to backend enum
* New PI_LEVEL0 value support was added to SYCL_BE config.
* Docs were updated. Mentioned Level Zero backend and provided the
link to the Level Zero runtime for Intel GPU.
* Changes in sycl cmake file to build level0 plugin by default and to
install it with sycl toolchain.

LIT testing with PI_LEVEL0 backend will be enabled in the following
commits. This commits introduces the plugin and makes it buildable.

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
L0 plugin doesn't support piProgramCompile/piProgramLink commands,
program is built during piProgramCreate.

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
@againull againull requested review from kbobrovs, pvchupin, smaslov-intel and a team as code owners May 19, 2020 20:48
@againull againull requested a review from s-kanaev May 19, 2020 20:48
@againull
Copy link
Contributor Author

Please don't review the fist commit "[SYCL] Untie PI functions from OpenCL" which is reviewed separately #1717

* Fix clang-format issue
* Fix using statement

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
Please, refer to [the Release Notes](../ReleaseNotes.md) for recommended Intel
runtime versions.

The `GPU` runtime that is needed to run DPC++ application on Intel `GPU` devices
can be downloaded from the following web pages:
To run DPC++ application on Intel `GPU` devices the OpenCL `GPU` runtime or the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1699 have this wording changed a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, according to new wording I just need to specify the version of the L0 runtime in buildbot/dependency.conf. I am not sure if the purpose of this file is just to provide information about version or if it used in some scripts. I will ask people working on CI and then will update this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildbot/dependency.conf is used in CI scripts. Added comment to this file explaining that there is a same driver for Level Zero and OpenCL,

againull added 3 commits May 20, 2020 03:23
Align with changes made for OpenCL plugin in
intel#1638:

* Fix build on Windows by defining __SYCL_BUILD_SYCL_DLL in cmake file
* Export only pi* symbols in libpi_level0.so
* Add test to check exported symbols

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
* Add link to the Level Zero specification
* Revert changes in the section providing links to download low level
runtimes. Links are supposed to be provided in buildbot/dependency.conf
according to upcoming changes intel#1699

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
@romanovvlad
Copy link
Contributor

@againull Could you please PR your patch even more? Some ideas: all changes not in Level0 dir go to a separate PRs; group PI API implementations(queue related, program related) and create separate PRs for each of such group.

@againull
Copy link
Contributor Author

againull commented May 20, 2020

@againull Could you please PR your patch even more? Some ideas: all changes not in Level0 dir go to a separate PRs; group PI API implementations(queue related, program related) and create separate PRs for each of such group.

@romanovvlad did you want to write this comment to #1723 ?

Copy link
Contributor

@Ruyk Ruyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, nice to see more non-opencl plugins added :-)

break;
case ZE_RESULT_ERROR_UNKNOWN:
ErrorString = "ZE_RESULT_ERROR_UNKNOWN";
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to maintain with a macro for each case, e.g.

switch(ZeError) {
#define ZE_ERRCASE(ERR)\
case ERR:\
   ErrorString = ""#ERR;\
   break;

ZE_ERRCASE(ZE_RESULT_ERROR_OVERLAPPING_REGIONS);
ZE_ERRCASE(ZE_RESULT_ERROR_UNKNOWN);
// ...
#undef ZE_ERRCASE
} // switch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the code, fixed.

void (*PFnNotify)(pi_event Event,
pi_int32 EventCommandStatus,
void *UserData),
void *UserData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callbacks are not used by SYCL-RT anymore. Among other problems, they caused performance issues on the CUDA backend. This code is not used, could be removed - we are going to mark it as not implemented in the CUDA backend soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the body. I think this functions should be removed from PI in the separate commit as they are not used in SYCL RT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

return PI_SUCCESS;
}

pi_result piEventSetStatus(pi_event Event, pi_int32 ExecutionStatus) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used by SYCL RT anymore (thanks @s-kanaev !)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made same change as for piEventSetCallback.

&ZeSamplerDesc, // TODO: translate properties
&ZeSampler));

*RetSampler = new _pi_sampler(ZeSampler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On error or bad_alloc this should return error code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

if (Res != PI_SUCCESS)
return Res;

piEventCreate(Queue->Context, Event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is piEventCreate twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

assert(Flags == PI_MEM_FLAGS_ACCESS_RW);
assert(BufferCreateType == PI_BUFFER_CREATE_TYPE_REGION);
assert(!(static_cast<_pi_buffer *>(Buffer))->isSubBuffer() &&
"Sub-buffer cannot be partitioned");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a level0 restriction or just keeping the existing SYCL/OpenCL limitations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, just keeping the existing SYCL/OpenCL limitations.


// L0 doesn't do the reference counting, so we have to do.
// Must be atomic to prevent data race when incrementing/decrementing.
std::atomic<pi_uint32> RefCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a problem/question we have on the CUDA plugin: Is reference counting required by SYCL-RT? seems from what I understand of the SYCL-RT code, Increment/Decrement of all PI objects is only used for interoperability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand we need to do reference counting to properly release the allocated resources.
@smaslov-intel Could you please give your input here.

againull added 4 commits May 20, 2020 20:25
Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
* Use macro in zeParseError function which is easier to maintain
* Fix env variable naming
* Remove implementation of piEventSetCallback and piEventSetStatus,
make them deprecated. These functions should be removed from PI in the
separate patch because they are not used in the SYCL RT.

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
usm_device_allocations = PI_USM_DEVICE_SUPPORT,
usm_host_allocations = PI_USM_HOST_SUPPORT,
usm_shared_allocations = PI_USM_SINGLE_SHARED_SUPPORT,
usm_device_allocations = PI_USM_DEVICE_SUPPORT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting changes unrelated to this PR

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
@againull againull requested a review from bader as a code owner May 20, 2020 20:15
againull added 2 commits May 20, 2020 15:32
* Added license header to pi_level0.cpp/pi_level0.hpp
* use unordered_map instead of map where possible
* Refactor _pi_mem::addMapping according to suggestion
* Add sanity checks to piProgramCompile and piPrgoramBuild
* Narrowed extern "C" to exported functions.
* Other minor fixes

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting.
A few questions and nitpicks.

class ZeCall {
private:
// The global mutex that is used for total serialization of L0 calls.
static std::mutex GlobalLock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this efficient?

// Controls L0 calls tracing in zePrint.
static bool ZeDebug = false;

static void zePrint(const char *Format, ...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to provide old unsafe C varargs interface in a C++17 code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction. Library source code is still uses C++14.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, does it requires to go down to C99?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I do not remember the real reason for C++14 only if SYCL goes to C++17?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, a few SYCL features requiring C++17 are implemented in "header-only" style, so they are not requiring the runtime library to be built with C++17 enabled.
We keep using C++14 features in the library to keep the libstdc++ requirement low and make the implementation more portable.

// write (as it is OK for multiple threads to read the map without locking it).

pi_result _pi_mem::addMapping(void *MappedTo, size_t Offset, size_t Size) {
std::lock_guard<std::mutex> Lock(MappingsMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::lock_guard<std::mutex> Lock(MappingsMutex);
std::lock_guard Lock { MappingsMutex };

C++17

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell The library is compiled with C++14

}

pi_result _pi_mem::removeMapping(void *MappedTo, Mapping &MapInfo) {
std::lock_guard<std::mutex> Lock(MappingsMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++17...

size_t Size;
};

virtual ~_pi_mem() = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this at the end of the class.
Because my first comment here was "Why do you need this?" :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thx.


// L0 doesn't do the reference counting, so we have to do.
// Must be atomic to prevent data race when incrementing/decrementing.
std::atomic<pi_uint32> RefCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could move this into a (CRTP?) mixin class to avoid all these copy-pastes in every class or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thx.

@@ -0,0 +1,81 @@
# PI Level0 plugin library

message(STATUS "Download Level Zero loader and headers from github.com")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an option to provide local version of the L0 headers and libraries like it is done for OpenCl dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this. But I have a question about how it is done for OpenCL. In configure.py:

 21     ocl_header_dir = os.path.join(abs_obj_dir, "OpenCL-Headers")
 22     icd_loader_lib = os.path.join(abs_obj_dir, "OpenCL-ICD-Loader", "build")

These paths are used if we pass --system-ocl to configure. But they are not system paths.
So this option/behavior is a little confusing. Shouldn't we look system paths (like /usr/local/lib or smth) if such option is provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ability to provide local version of headers and loader.

DEPENDS ocl-headers
BUILD_BYPRODUCTS ${L0_LIBRARY}
)
ExternalProject_Add_Step(l0-loader llvminstall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like l0-loader is not added to deploy-sycl-toolchain. So it will not be available after recommended build process. So if we build compiler and runtime on machine with GPU it might be non-functional due to missed dependency library (l0-loader)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimirlaz I have a question, do we deploy OpenCL.so? I tried to do ninja install and don't see ICD in the installation directory. How this works for OpenCL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@againull you need to download it on build time (rather than using pre-installed) and specify a special flag on cmake configuration. See fd64c8f#diff-ad9ca624b2a795a5f24145a17c806d1f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added l0-loader to deploy-sycl-toolchain. One question: how the scenario described by Vladimir will work for OpenCL if ICD is added to deploy-sycl-toolchain only under special option?

vladimirlaz
vladimirlaz previously approved these changes Jun 2, 2020
pvchupin
pvchupin previously approved these changes Jun 2, 2020
kbobrovs
kbobrovs previously approved these changes Jun 2, 2020
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
@againull
Copy link
Contributor Author

againull commented Jun 2, 2020

Conflict with master branch (in dependency.conf) appeared right just after all approvals. Merging master branch with conflict resolving dismissed all of them:(

pvchupin
pvchupin previously approved these changes Jun 2, 2020
@pvchupin
Copy link
Contributor

pvchupin commented Jun 2, 2020

I think you've got all approvals, so let's wait for clean testing and merge.
Me or @bader can merge I think.

@againull
Copy link
Contributor Author

againull commented Jun 5, 2020

Blocked by #1816 which is going to be fixed in #1824

@againull
Copy link
Contributor Author

againull commented Jun 8, 2020

There were no changes made after approvals expect merging intel/sycl. Problem on cuda target fixed. Can we merge this PR?

@bader
Copy link
Contributor

bader commented Jun 8, 2020

Can I use the description as a commit message or do you want to update it?

@againull
Copy link
Contributor Author

againull commented Jun 8, 2020

Can I use the description as a commit message or do you want to update it?

Yes, description is ok for commit message.

@bader bader merged commit d32da99 into intel:sycl Jun 8, 2020
@againull againull deleted the l0_plugin branch December 3, 2022 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.