Skip to content

[SYCL][XPTI] Report memory allocation info from SYCL runtime #5172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions sycl/doc/SYCLInstrumentationUsingXPTI.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,3 +256,12 @@ All trace point types in bold provide semantic information about the graph, node
| `wait_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::wait_end` that marks the beginning of the wait on an `event`</li> <li> **parent**: `nullptr`</li> <li> **event**: The event ID will reflect the ID of the command group object submission that created this event or a new event based on the combination of the string "queue.wait" and the address of the event. </li> <li> **instance**: Unique ID to allow the correlation of the `wait_begin` event with the `wait_end` event. </li> <li> **user_data**: String indicating `queue.wait` and the address of the event as `const char *` </li></div> | **`sycl_device`**, `sym_function_name`, `sym_source_file_name`, `sym_line_no` |
| `barrier_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_begin` that marks the beginning of a barrier while enqueuing a command group object</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
| `barrier_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_end` that marks the end of the barrier that is encountered during enqueue.</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |

## Level Zero Plugin Stream `"oneapi.level_zero.experimental.mem_alloc"` Notification Signatures

| Trace Point Type | Parameter Description | Metadata |
| :------------------------: | :-------------------- | :------- |
| `mem_alloc_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocation size, and guard zone size (if any). </li></div> | None |
| `mem_alloc_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_alloc_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocated pointer, allocation size, and guard zone size (if any). </li></div> | None |
| `mem_release_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |
| `mem_release_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_release_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |
11 changes: 6 additions & 5 deletions sycl/source/detail/device_image_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <detail/context_impl.hpp>
#include <detail/device_impl.hpp>
#include <detail/kernel_id_impl.hpp>
#include <detail/mem_alloc_helper.hpp>
#include <detail/plugin.hpp>
#include <detail/program_manager/program_manager.hpp>

Expand Down Expand Up @@ -185,11 +186,11 @@ class device_image_impl {
std::lock_guard<std::mutex> Lock{MSpecConstAccessMtx};
if (nullptr == MSpecConstsBuffer && !MSpecConstsBlob.empty()) {
const detail::plugin &Plugin = getSyclObjImpl(MContext)->getPlugin();
Plugin.call<PiApiKind::piMemBufferCreate>(
detail::getSyclObjImpl(MContext)->getHandleRef(),
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
MSpecConstsBlob.size(), MSpecConstsBlob.data(), &MSpecConstsBuffer,
nullptr);
memBufferCreateHelper(Plugin,
detail::getSyclObjImpl(MContext)->getHandleRef(),
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
MSpecConstsBlob.size(), MSpecConstsBlob.data(),
&MSpecConstsBuffer, nullptr);
}
return MSpecConstsBuffer;
}
Expand Down
32 changes: 32 additions & 0 deletions sycl/source/detail/mem_alloc_helper.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
//==-------- mem_alloc_helper.hpp - SYCL mem alloc helper ------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include <CL/sycl/detail/pi.h>

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {
void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
pi_mem_flags Flags, size_t Size, void *HostPtr,
pi_mem *RetMem,
const pi_mem_properties *Props = nullptr);
void memReleaseHelper(const plugin &Plugin, pi_mem Mem);
void memBufferMapHelper(const plugin &Plugin, pi_queue command_queue,
pi_mem buffer, pi_bool blocking_map,
pi_map_flags map_flags, size_t offset, size_t size,
pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event,
void **ret_map);
void memUnmapHelper(const plugin &Plugin, pi_queue command_queue, pi_mem memobj,
void *mapped_ptr, pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event);
} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
199 changes: 188 additions & 11 deletions sycl/source/detail/memory_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,106 @@
#include <CL/sycl/detail/memory_manager.hpp>
#include <detail/context_impl.hpp>
#include <detail/event_impl.hpp>
#include <detail/mem_alloc_helper.hpp>
#include <detail/queue_impl.hpp>

#include <algorithm>
#include <cassert>
#include <cstring>
#include <vector>

#ifdef XPTI_ENABLE_INSTRUMENTATION
#include <xpti/xpti_data_types.h>
#include <xpti/xpti_trace_framework.hpp>
#endif

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {

#ifdef XPTI_ENABLE_INSTRUMENTATION
uint8_t GMemAllocStreamID;
xpti::trace_event_data_t *GMemAllocEvent;
#endif

uint64_t emitMemAllocBeginTrace(uintptr_t ObjHandle, size_t AllocSize,
size_t GuardZone) {
(void)ObjHandle;
(void)AllocSize;
(void)GuardZone;
uint64_t CorrelationID = 0;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, 0 /* alloc ptr */, AllocSize,
GuardZone};

CorrelationID = xptiGetUniqueId();
xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_begin),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
return CorrelationID;
}

void emitMemAllocEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
size_t AllocSize, size_t GuardZone,
uint64_t CorrelationID) {
(void)ObjHandle;
(void)AllocPtr;
(void)AllocSize;
(void)GuardZone;
(void)CorrelationID;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, AllocSize, GuardZone};

xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_end),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
}

uint64_t emitMemReleaseBeginTrace(uintptr_t ObjHandle, uintptr_t AllocPtr) {
(void)ObjHandle;
(void)AllocPtr;
#ifdef XPTI_ENABLE_INSTRUMENTATION
uint64_t CorrelationID = 0;
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
0 /* guard zone */};

CorrelationID = xptiGetUniqueId();
xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_begin),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
return CorrelationID;
}

void emitMemReleaseEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
uint64_t CorrelationID) {
(void)ObjHandle;
(void)AllocPtr;
(void)CorrelationID;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
0 /* guard zone */};

xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_end),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
}

static void waitForEvents(const std::vector<EventImplPtr> &Events) {
// Assuming all events will be on the same device or
// devices associated with the same Backend.
Expand All @@ -34,6 +123,97 @@ static void waitForEvents(const std::vector<EventImplPtr> &Events) {
}
}

void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
pi_mem_flags Flags, size_t Size, void *HostPtr,
pi_mem *RetMem, const pi_mem_properties *Props) {
uint64_t CorrID = 0;
// We only want to instrument piMemBufferCreate
{
CorrID =
emitMemAllocBeginTrace(0 /* mem object */, Size, 0 /* guard zone */);
xpti::utils::finally _{[&] {
// C-style cast is required for MSVC
uintptr_t MemObjID = (uintptr_t)(*RetMem);
pi_native_handle Ptr = 0;
// Always use call_nocheck here, because call may throw an exception,
// and this lambda will be called from destructor, which in combination
// rewards us with UB.
Plugin.call_nocheck<PiApiKind::piextMemGetNativeHandle>(*RetMem, &Ptr);
emitMemAllocEndTrace(MemObjID, (uintptr_t)(Ptr), Size, 0 /* guard zone */,
CorrID);
}};
Plugin.call<PiApiKind::piMemBufferCreate>(Ctx, Flags, Size, HostPtr, RetMem,
Props);
}
}

void memReleaseHelper(const plugin &Plugin, pi_mem Mem) {
// FIXME piMemRelease does not guarante memory release. It is only true if
// reference counter is 1. However, SYCL runtime currently only calls
// piMemRetain only for OpenCL interop
uint64_t CorrID = 0;
// C-style cast is required for MSVC
uintptr_t MemObjID = (uintptr_t)(Mem);
uintptr_t Ptr = 0;
// Do not make unnecessary PI calls without instrumentation enabled
if (xptiTraceEnabled()) {
pi_native_handle PtrHandle = 0;
Plugin.call<PiApiKind::piextMemGetNativeHandle>(Mem, &PtrHandle);
Ptr = (uintptr_t)(PtrHandle);
}
// We only want to instrument piMemRelease
{
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
xpti::utils::finally _{
[&] { emitMemReleaseEndTrace(MemObjID, Ptr, CorrID); }};
Plugin.call<PiApiKind::piMemRelease>(Mem);
}
}

void memBufferMapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Buffer,
pi_bool Blocking, pi_map_flags Flags, size_t Offset,
size_t Size, pi_uint32 NumEvents,
const pi_event *WaitList, pi_event *Event,
void **RetMap) {
uint64_t CorrID = 0;
uintptr_t MemObjID = (uintptr_t)(Buffer);
// We only want to instrument piEnqueueMemBufferMap
{
CorrID = emitMemAllocBeginTrace(MemObjID, Size, 0 /* guard zone */);
xpti::utils::finally _{[&] {
emitMemAllocEndTrace(MemObjID, (uintptr_t)(*RetMap), Size,
0 /* guard zone */, CorrID);
}};
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
Queue, Buffer, Blocking, Flags, Offset, Size, NumEvents, WaitList,
Event, RetMap);
}
}

void memUnmapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Mem,
void *MappedPtr, pi_uint32 NumEvents,
const pi_event *WaitList, pi_event *Event) {
uint64_t CorrID = 0;
uintptr_t MemObjID = (uintptr_t)(Mem);
uintptr_t Ptr = (uintptr_t)(MappedPtr);
// We only want to instrument piEnqueueMemUnmap
{
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
xpti::utils::finally _{[&] {
// There's no way for SYCL to know, when the pointer is freed, so we have
// to explicitly wait for the end of data transfers here in order to
// report correct events.
// Always use call_nocheck here, because call may throw an exception,
// and this lambda will be called from destructor, which in combination
// rewards us with UB.
Plugin.call_nocheck<PiApiKind::piEventsWait>(1, Event);
emitMemReleaseEndTrace(MemObjID, Ptr, CorrID);
}};
Plugin.call<PiApiKind::piEnqueueMemUnmap>(Queue, Mem, MappedPtr, NumEvents,
WaitList, Event);
}
}

void MemoryManager::release(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
void *MemAllocation,
std::vector<EventImplPtr> DepEvents,
Expand Down Expand Up @@ -67,7 +247,7 @@ void MemoryManager::releaseMemObj(ContextImplPtr TargetContext,
}

const detail::plugin &Plugin = TargetContext->getPlugin();
Plugin.call<PiApiKind::piMemRelease>(pi::cast<RT::PiMem>(MemAllocation));
memReleaseHelper(Plugin, pi::cast<RT::PiMem>(MemAllocation));
}

void *MemoryManager::allocate(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
Expand Down Expand Up @@ -165,9 +345,8 @@ MemoryManager::allocateBufferObject(ContextImplPtr TargetContext, void *UserPtr,

RT::PiMem NewMem = nullptr;
const detail::plugin &Plugin = TargetContext->getPlugin();
Plugin.call<PiApiKind::piMemBufferCreate>(TargetContext->getHandleRef(),
CreationFlags, Size, UserPtr,
&NewMem, nullptr);
memBufferCreateHelper(Plugin, TargetContext->getHandleRef(), CreationFlags,
Size, UserPtr, &NewMem, nullptr);
return NewMem;
}

Expand Down Expand Up @@ -623,10 +802,9 @@ void *MemoryManager::map(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
void *MappedPtr = nullptr;
const size_t BytesToMap = AccessRange[0] * AccessRange[1] * AccessRange[2];
const detail::plugin &Plugin = Queue->getPlugin();
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), CL_FALSE, Flags,
AccessOffset[0], BytesToMap, DepEvents.size(), DepEvents.data(),
&OutEvent, &MappedPtr);
memBufferMapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
CL_FALSE, Flags, AccessOffset[0], BytesToMap,
DepEvents.size(), DepEvents.data(), &OutEvent, &MappedPtr);
return MappedPtr;
}

Expand All @@ -639,9 +817,8 @@ void MemoryManager::unmap(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
// Using the plugin of the Queue.

const detail::plugin &Plugin = Queue->getPlugin();
Plugin.call<PiApiKind::piEnqueueMemUnmap>(
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), MappedPtr,
DepEvents.size(), DepEvents.data(), &OutEvent);
memUnmapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
MappedPtr, DepEvents.size(), DepEvents.data(), &OutEvent);
}

void MemoryManager::copy_usm(const void *SrcMem, QueueImplPtr SrcQueue,
Expand Down
Loading