-
Notifications
You must be signed in to change notification settings - Fork 771
[SYCL][ESIMD][EMU] Memory intrinsic updates for ESIMD_EMU plugin #4748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ESIMD][EMU] Memory intrinsic updates for ESIMD_EMU plugin #4748
Conversation
dongkyunahn-intel
commented
Oct 11, 2021
- Intrinsic implementations for host device are replaced with implementation for ESIMD_EMU
- Intrinsic implementations for host device are replaced with implementation for ESIMD_EMU
- media_ld/st host-mode implementation is replaced with ESIMD_EMU support implementation
sycl/include/sycl/ext/intel/experimental/esimd/detail/memory_intrin.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/intel/experimental/esimd/detail/memory_intrin.hpp
Outdated
Show resolved
Hide resolved
- + Relocating SLM_BTI definition
- and typo fix in memory.hpp
Due to changes from memory intrinsic refactoring (#4720), the ESIMD Device interface used in this PR needs to be changed. I'll resume this PR as soon as my other high priority PRs are merged to the origin/sycl branch. |
…emory_intrinsic
…emory_intrinsic
- In order to use surface-index including SLM_BTI
sycl/include/sycl/ext/intel/experimental/esimd/emu/detail/esimd_emulator_device_interface.hpp
Outdated
Show resolved
Hide resolved
…emory_intrinsic
- Kernel build fails because of assertion from 'translateSLMInit()' - +Typo fix in error log message
- With Version string info enclosed in device struct
- as ESIMD emulator is not productized yet
- lower_intrins.ll test fails because of '__esimd_slm_init' lowering - Failing lines are removed
…emory_intrinsic
…emory_intrinsic
@@ -116,6 +116,18 @@ static bool PrintPiTrace = false; | |||
// Sycl RT calls piTearDown(). | |||
static sycl::detail::ESIMDEmuPluginOpaqueData *PiESimdDeviceAccess; | |||
|
|||
// Single-entry cache for piPlatformsGet call. | |||
static pi_platform PiPlatformCache; | |||
static sycl::detail::SpinLock *PiPlatformCacheMutex = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a TODO comment that this is a memory leak, and then address in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Konst : Follow level_zero implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix this as soon as possible.
// Mapping between surface index and CM-managed surface | ||
static std::unordered_map<unsigned int, _pi_mem *> *PiESimdSurfaceMap = | ||
new std::unordered_map<unsigned int, _pi_mem *>; | ||
static sycl::detail::SpinLock *PiESimdSurfaceMapLock = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
static pi_platform PiPlatformCache; | ||
static sycl::detail::SpinLock *PiPlatformCacheMutex = | ||
new sycl::detail::SpinLock; | ||
static bool PiPlatformCachePopulated = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it local static in the function where it is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- Removing unnecessary 'cm_rt.h' - TODO comments for memory leaks associated with SpinLocks - Making a static variable local one for minimized variable scope - Removing comments related to removed 'v2.h' file
…emory_intrinsic
new std::unordered_map<unsigned int, _pi_mem *>; | ||
// TODO/FIXME : Memory leak. Handle with 'piTearDown'. | ||
static sycl::detail::SpinLock *PiESimdSurfaceMapLock = | ||
new sycl::detail::SpinLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Question: what's the purpose of using dynamic allocation here and above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was imported from level_zero plug-in implementation. We have to follow level_zero implementation in general. I didn't have chance to ask why to level_zero implementation authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaslov-intel , could you comment on this - why level_zero use dynamic allocation for such global locks? -
static sycl::detail::SpinLock *PiPlatformsCacheMutex = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a guess: it's one of ways to deal with static objects destruction order trouble (when you use a static/global object on destruction of other global/static objects and it may be already destroyed corresponding to the order of their initialization). (also this method doesn't help with static initialization order problem).
But. The big question is: is it really needed in this particular place: are those mutexes and data structures to be used on destruction of other static/global objects indirectly out of this program module (?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lsatanov, @smaslov-intel, @romanovvlad, @kbobrovs , I created an issue ticket and assigned it to me in order to keep track of this issue.
/build |
/summary:run |
@romanovvlad / @intel/llvm-reviewers-runtime , would you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments. I'm OK if they are resolved in a separate PR.
std::cerr << "ESIMD EMU plugin error or not loaded - try setting " | ||
"SYCL_DEVICE_FILTER=esimd_emulator:gpu environment variable" | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do not print messages unconditionally. This message can be in the exception which is thrown below, so a user can read it by calling .what() . Or it can be done under pi trace check, see https://github.com/intel/llvm/blob/sycl/sycl/source/device_selector.cpp#L48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
@@ -116,6 +116,18 @@ static bool PrintPiTrace = false; | |||
// Sycl RT calls piTearDown(). | |||
static sycl::detail::ESIMDEmuPluginOpaqueData *PiESimdDeviceAccess; | |||
|
|||
// Single-entry cache for piPlatformsGet call. | |||
static pi_platform PiPlatformCache; | |||
static sycl::detail::SpinLock *PiPlatformCacheMutex = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix this as soon as possible.
@@ -401,9 +462,15 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms, | |||
return PI_INVALID_VALUE; | |||
} | |||
|
|||
const std::lock_guard<sycl::detail::SpinLock> Lock{*PiPlatformCacheMutex}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not safe to access this var here as it might not be constructed yet. Hopefully it should be solved when all global variables are allocated(new
) in piInitialize
and deallocated in piTearDown
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like most of atomicity controls in this file, PiPlatformCacheMutex
was imported from level_zero implementation. I'll create a PR designated for atomicity issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanovvlad, I created an issue ticket and assigned it to me in order to keep track of this issue.
@@ -401,9 +462,15 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms, | |||
return PI_INVALID_VALUE; | |||
} | |||
|
|||
const std::lock_guard<sycl::detail::SpinLock> Lock{*PiPlatformCacheMutex}; | |||
if (!PiPlatformCachePopulated) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we could check PiPlatformCache == nullptr
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will apply in the PR designated for atomicity as mentioned above.
…el#4748) Intrinsic implementations for host device are replaced with implementation for ESIMD_EMU
…el#4748) Intrinsic implementations for host device are replaced with implementation for ESIMD_EMU
* upstream/sycl: (2757 commits) [SYCL][Doc] Fixing incorrect merge of community Readme.md with our version (intel#5636) [SYCL] Change USM pooling parameters. (intel#5457) [CI] Fix cache location on Windows (intel#5603) [SYCL][NFC] Fix a warning about uninitialized struct members (intel#5610) [Buildbot] Update Windows GPU version to 101.1340 (intel#5620) Fix SPIRV -> OCL barrier call argument attributes Move SPV_INTEL_memory_access_aliasing tokens from spirv_internal [SYCL][ESIMD] Add support for named barrier APIs (intel#5583) [SYCL][L0] Remove ZeModule when program build failed (intel#5541) [SYCL] Silence "unknown attribute" warning for `device_indirectly_callable` (intel#5591) [SYCL][DOC] Introductory material for extensions (intel#5605) [SYCL][DOC] Change extension names to lower case (intel#5607) [SYCL] Improve get_kernel_bundle performance (intel#5496) [SYCL] Do not build device code for sub-devices (intel#5240) [sycl-post-link] Fix a crash during spec-constant properties generation (intel#5538) [SYCL][DOC] Move SPIR-V and OpenCL extensions (intel#5578) [SYCL][ESIMD][EMU] Update memory intrinsics for ESIMD_EMU plugin (intel#4748) [CI] Allow stale issue bot to analyze more issues (intel#5602) [SYCL][L0] Honor property::queue::enable_profiling (intel#5543) [OpenMP] Properly save strings when doing LTO ...