Skip to content

[SYCL][ESIMD][EMU] Memory intrinsic updates for ESIMD_EMU plugin #4748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Feb 17, 2022

Conversation

dongkyunahn-intel
Copy link
Contributor

  • Intrinsic implementations for host device are replaced with implementation for ESIMD_EMU

- Intrinsic implementations for host device are replaced with
implementation for ESIMD_EMU
- media_ld/st host-mode implementation is replaced with ESIMD_EMU
support implementation
- + Relocating SLM_BTI definition
@dongkyunahn-intel
Copy link
Contributor Author

dongkyunahn-intel commented Oct 25, 2021

Due to changes from memory intrinsic refactoring (#4720), the ESIMD Device interface used in this PR needs to be changed. I'll resume this PR as soon as my other high priority PRs are merged to the origin/sycl branch.

dongkyunahn-intel added a commit to dongkyunahn-intel/llvm that referenced this pull request Nov 1, 2021
@dongkyunahn-intel dongkyunahn-intel requested a review from a team as a code owner January 24, 2022 18:49
- With Version string info enclosed in device struct
- as ESIMD emulator is not productized yet
- lower_intrins.ll test fails because of '__esimd_slm_init' lowering
- Failing lines are removed
@@ -116,6 +116,18 @@ static bool PrintPiTrace = false;
// Sycl RT calls piTearDown().
static sycl::detail::ESIMDEmuPluginOpaqueData *PiESimdDeviceAccess;

// Single-entry cache for piPlatformsGet call.
static pi_platform PiPlatformCache;
static sycl::detail::SpinLock *PiPlatformCacheMutex =
Copy link
Contributor

@kbobrovs kbobrovs Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO comment that this is a memory leak, and then address in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Konst : Follow level_zero implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix this as soon as possible.

// Mapping between surface index and CM-managed surface
static std::unordered_map<unsigned int, _pi_mem *> *PiESimdSurfaceMap =
new std::unordered_map<unsigned int, _pi_mem *>;
static sycl::detail::SpinLock *PiESimdSurfaceMapLock =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

static pi_platform PiPlatformCache;
static sycl::detail::SpinLock *PiPlatformCacheMutex =
new sycl::detail::SpinLock;
static bool PiPlatformCachePopulated = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it local static in the function where it is used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Removing unnecessary 'cm_rt.h'
- TODO comments for memory leaks associated with SpinLocks
- Making a static variable local one for minimized variable scope

- Removing comments related to removed 'v2.h' file
new std::unordered_map<unsigned int, _pi_mem *>;
// TODO/FIXME : Memory leak. Handle with 'piTearDown'.
static sycl::detail::SpinLock *PiESimdSurfaceMapLock =
new sycl::detail::SpinLock;
Copy link
Contributor

@lsatanov lsatanov Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Question: what's the purpose of using dynamic allocation here and above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was imported from level_zero plug-in implementation. We have to follow level_zero implementation in general. I didn't have chance to ask why to level_zero implementation authors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel , could you comment on this - why level_zero use dynamic allocation for such global locks? -

static sycl::detail::SpinLock *PiPlatformsCacheMutex =

Copy link
Contributor

@lsatanov lsatanov Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a guess: it's one of ways to deal with static objects destruction order trouble (when you use a static/global object on destruction of other global/static objects and it may be already destroyed corresponding to the order of their initialization). (also this method doesn't help with static initialization order problem).

But. The big question is: is it really needed in this particular place: are those mutexes and data structures to be used on destruction of other static/global objects indirectly out of this program module (?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lsatanov, @smaslov-intel, @romanovvlad, @kbobrovs , I created an issue ticket and assigned it to me in order to keep track of this issue.

#5599

kbobrovs
kbobrovs previously approved these changes Feb 15, 2022
@dongkyunahn-intel
Copy link
Contributor Author

/build

@dongkyunahn-intel
Copy link
Contributor Author

/summary:run

@dongkyunahn-intel
Copy link
Contributor Author

@romanovvlad / @intel/llvm-reviewers-runtime , would you review this PR?

Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. I'm OK if they are resolved in a separate PR.

Comment on lines +36 to +38
std::cerr << "ESIMD EMU plugin error or not loaded - try setting "
"SYCL_DEVICE_FILTER=esimd_emulator:gpu environment variable"
<< std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do not print messages unconditionally. This message can be in the exception which is thrown below, so a user can read it by calling .what() . Or it can be done under pi trace check, see https://github.com/intel/llvm/blob/sycl/sycl/source/device_selector.cpp#L48

Copy link
Contributor Author

@dongkyunahn-intel dongkyunahn-intel Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

@@ -116,6 +116,18 @@ static bool PrintPiTrace = false;
// Sycl RT calls piTearDown().
static sycl::detail::ESIMDEmuPluginOpaqueData *PiESimdDeviceAccess;

// Single-entry cache for piPlatformsGet call.
static pi_platform PiPlatformCache;
static sycl::detail::SpinLock *PiPlatformCacheMutex =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix this as soon as possible.

@@ -401,9 +462,15 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,
return PI_INVALID_VALUE;
}

const std::lock_guard<sycl::detail::SpinLock> Lock{*PiPlatformCacheMutex};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe to access this var here as it might not be constructed yet. Hopefully it should be solved when all global variables are allocated(new) in piInitialize and deallocated in piTearDown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like most of atomicity controls in this file, PiPlatformCacheMutex was imported from level_zero implementation. I'll create a PR designated for atomicity issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad, I created an issue ticket and assigned it to me in order to keep track of this issue.

#5599

@@ -401,9 +462,15 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,
return PI_INVALID_VALUE;
}

const std::lock_guard<sycl::detail::SpinLock> Lock{*PiPlatformCacheMutex};
if (!PiPlatformCachePopulated) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we could check PiPlatformCache == nullptr instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apply in the PR designated for atomicity as mentioned above.

@romanovvlad romanovvlad merged commit 1a8f501 into intel:sycl Feb 17, 2022
maximdimakov pushed a commit to maximdimakov/llvm that referenced this pull request Feb 17, 2022
…el#4748)

Intrinsic implementations for host device are replaced with
implementation for ESIMD_EMU
smaslov-intel pushed a commit to smaslov-intel/llvm that referenced this pull request Feb 19, 2022
…el#4748)

Intrinsic implementations for host device are replaced with
implementation for ESIMD_EMU
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Feb 23, 2022
* upstream/sycl: (2757 commits)
  [SYCL][Doc] Fixing incorrect merge of community Readme.md with our version (intel#5636)
  [SYCL] Change USM pooling parameters. (intel#5457)
  [CI] Fix cache location on Windows (intel#5603)
  [SYCL][NFC] Fix a warning about uninitialized struct members (intel#5610)
  [Buildbot] Update Windows GPU version to 101.1340 (intel#5620)
  Fix SPIRV -> OCL barrier call argument attributes
  Move SPV_INTEL_memory_access_aliasing tokens from spirv_internal
  [SYCL][ESIMD] Add support for named barrier APIs (intel#5583)
  [SYCL][L0] Remove ZeModule when program build failed (intel#5541)
  [SYCL] Silence "unknown attribute" warning for `device_indirectly_callable` (intel#5591)
  [SYCL][DOC] Introductory material for extensions (intel#5605)
  [SYCL][DOC] Change extension names to lower case (intel#5607)
  [SYCL] Improve get_kernel_bundle performance (intel#5496)
  [SYCL] Do not build device code for sub-devices (intel#5240)
  [sycl-post-link] Fix a crash during spec-constant properties generation (intel#5538)
  [SYCL][DOC] Move SPIR-V and OpenCL extensions (intel#5578)
  [SYCL][ESIMD][EMU] Update memory intrinsics for ESIMD_EMU plugin (intel#4748)
  [CI] Allow stale issue bot to analyze more issues (intel#5602)
  [SYCL][L0] Honor property::queue::enable_profiling (intel#5543)
  [OpenMP] Properly save strings when doing LTO
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants