Skip to content

[SYCL] Fix event profiling for command_submit in L0 and other backends #7526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Jan 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
7973c58
[SYCL] Implement command_submit L0
raaiq1 Nov 24, 2022
2dd3761
Add PI API extension piGetDeviceAndHostTimer
raaiq1 Dec 12, 2022
bf53daa
Merge branch 'sycl' into another_design
raaiq1 Dec 12, 2022
f73197d
Formatting
raaiq1 Dec 12, 2022
d25bc5d
Moe formatting
raaiq1 Dec 12, 2022
c7d76cc
E
raaiq1 Dec 13, 2022
1c465da
Add piGetDeviceAndHostTimer
raaiq1 Dec 13, 2022
ecfc6b1
Dummy
raaiq1 Dec 13, 2022
3e40b44
Merge branch 'sycl' into another_design
raaiq1 Dec 13, 2022
375ff05
Apply review suggestions
raaiq1 Dec 13, 2022
23ec955
Fix build errors
raaiq1 Dec 14, 2022
713430e
Add piGetDeviceAndHostTimer for CUDA and HIP
raaiq1 Dec 15, 2022
dc509a9
Formatting
raaiq1 Dec 15, 2022
82fbb3b
Fix issues
raaiq1 Dec 15, 2022
6cacfff
Formatting
raaiq1 Dec 15, 2022
810cbe0
Added documentation
raaiq1 Dec 16, 2022
6bbc64f
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 16, 2022
9645592
Formatting
raaiq1 Dec 16, 2022
474fa17
Apply suggestions from code review
raaiq1 Dec 19, 2022
a7b3b60
Added unittests
raaiq1 Dec 19, 2022
1d4d355
Formatting
raaiq1 Dec 19, 2022
a32b8e8
Merge branch 'sycl' into another_design
raaiq1 Dec 19, 2022
d086af8
More formatting
raaiq1 Dec 19, 2022
1e74912
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 19, 2022
5d01757
Fix HIP fail
raaiq1 Dec 19, 2022
26ca040
Apply suggestions from code review
raaiq1 Dec 20, 2022
12a0536
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 20, 2022
ba6ccc8
Add review suggestions,fix HIP issues and handle host platform
raaiq1 Dec 20, 2022
bfcc33e
[SYCL] Implement command_submit L0
raaiq1 Nov 24, 2022
d353ae8
Add PI API extension piGetDeviceAndHostTimer
raaiq1 Dec 12, 2022
d871187
Formatting
raaiq1 Dec 12, 2022
6361bc0
Moe formatting
raaiq1 Dec 12, 2022
e2fc03a
E
raaiq1 Dec 13, 2022
0473ed4
Add piGetDeviceAndHostTimer
raaiq1 Dec 13, 2022
0d9021f
Dummy
raaiq1 Dec 13, 2022
4be8dc0
Apply review suggestions
raaiq1 Dec 13, 2022
3700114
Fix build errors
raaiq1 Dec 14, 2022
5ee56ef
Add piGetDeviceAndHostTimer for CUDA and HIP
raaiq1 Dec 15, 2022
ba0b2db
Formatting
raaiq1 Dec 15, 2022
8268fdf
Fix issues
raaiq1 Dec 15, 2022
5e2949f
Added documentation
raaiq1 Dec 16, 2022
19cd2e9
Formatting
raaiq1 Dec 15, 2022
82ff6c7
Formatting
raaiq1 Dec 16, 2022
451de35
Apply suggestions from code review
raaiq1 Dec 19, 2022
2d6a636
Added unittests
raaiq1 Dec 19, 2022
b9c4171
Formatting
raaiq1 Dec 19, 2022
ac695db
More formatting
raaiq1 Dec 19, 2022
89ffa97
Fix HIP fail
raaiq1 Dec 19, 2022
554fde5
Apply suggestions from code review
raaiq1 Dec 20, 2022
228b22e
Add review suggestions,fix HIP issues and handle host platform
raaiq1 Dec 20, 2022
f6426f3
Fix ESIMD fails
raaiq1 Dec 20, 2022
a891d2c
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 20, 2022
26319bf
Fix ESIMD fails
raaiq1 Dec 20, 2022
d5760dd
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 20, 2022
843bf6c
Fix command submit query placement
raaiq1 Dec 20, 2022
3337ecb
Merge branch 'another_design' of https://github.com/raaiq1/llvm into …
raaiq1 Dec 20, 2022
eda7e39
Fix test fails
raaiq1 Dec 21, 2022
43b24e7
Fix CUDA fails again
raaiq1 Dec 21, 2022
e834879
Formatting
raaiq1 Dec 21, 2022
37f2e53
Remove removal comments
raaiq1 Dec 22, 2022
2df30af
Added TODO comment
raaiq1 Dec 22, 2022
dc18419
Remove bad comment
raaiq1 Dec 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sycl/include/sycl/detail/pi.def
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,10 @@ _PI_API(piPluginGetLastError)

_PI_API(piTearDown)


_PI_API(piextUSMEnqueueFill2D)
_PI_API(piextUSMEnqueueMemset2D)
_PI_API(piextUSMEnqueueMemcpy2D)

_PI_API(piGetDeviceAndHostTimer)
#undef _PI_API
20 changes: 18 additions & 2 deletions sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,10 @@
// PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMSET2D_SUPPORT, and
// PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMCPY2D_SUPPORT context info query
// descriptors.
// 12.22 Add piGetDeviceAndHostTimer to query device wall-clock timestamp

#define _PI_H_VERSION_MAJOR 12
#define _PI_H_VERSION_MINOR 21
#define _PI_H_VERSION_MINOR 22

#define _PI_STRING_HELPER(a) #a
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
Expand Down Expand Up @@ -1898,9 +1899,24 @@ __SYCL_EXPORT pi_result piTearDown(void *PluginParameter);
///
/// \return PI_SUCCESS if plugin is indicating non-fatal warning. Any other
/// error code indicates that plugin considers this to be a fatal error and the
/// runtime must handle it or end the application.
/// Returns the global timestamp from \param device , and syncronized host
/// timestamp
__SYCL_EXPORT pi_result piPluginGetLastError(char **message);

/// Queries device for it's global timestamp in nanoseconds, and updates
/// HostTime with the value of the host timer at the closest possible point in
/// time to that at which DeviceTime was returned.
///
/// \param Device device to query for timestamp
/// \param DeviceTime pointer to store device timestamp in nanoseconds. Optional
/// argument, can be nullptr
/// \param HostTime pointer to store host timestamp in
/// nanoseconds. Optional argurment, can be nullptr in which case timestamp will
/// not be written
__SYCL_EXPORT pi_result piGetDeviceAndHostTimer(pi_device Device,
uint64_t *DeviceTime,
uint64_t *HostTime);

struct _pi_plugin {
// PI version supported by host passed to the plugin. The Plugin
// checks and writes the appropriate Function Pointers in
Expand Down
33 changes: 32 additions & 1 deletion sycl/plugins/cuda/pi_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

#include <algorithm>
#include <cassert>
#include <chrono>
#include <cuda.h>
#include <cuda_device_runtime_api.h>
#include <limits>
Expand Down Expand Up @@ -2134,7 +2135,6 @@ pi_result cuda_piContextCreate(const pi_context_properties *properties,
piContextPtr = std::unique_ptr<_pi_context>(new _pi_context{
_pi_context::kind::user_defined, newContext, *devices});
}

static std::once_flag initFlag;
std::call_once(
initFlag,
Expand Down Expand Up @@ -3889,6 +3889,7 @@ pi_result cuda_piEventGetProfilingInfo(pi_event event,
switch (param_name) {
case PI_PROFILING_INFO_COMMAND_QUEUED:
case PI_PROFILING_INFO_COMMAND_SUBMIT:
// Note: No user for this case
return getInfo<pi_uint64>(param_value_size, param_value,
param_value_size_ret, event->get_queued_time());
case PI_PROFILING_INFO_COMMAND_START:
Expand Down Expand Up @@ -5486,6 +5487,35 @@ pi_result cuda_piTearDown(void *) {
return PI_SUCCESS;
}

pi_result cuda_piGetDeviceAndHostTimer(pi_device Device, uint64_t *DeviceTime,
uint64_t *HostTime) {
_pi_event::native_type event;
ScopedContext active(Device->get_context());

if (DeviceTime) {
PI_CHECK_ERROR(cuEventCreate(&event, CU_EVENT_DEFAULT));
PI_CHECK_ERROR(cuEventRecord(event, 0));
}
if (HostTime) {

using namespace std::chrono;
*HostTime =
duration_cast<nanoseconds>(steady_clock::now().time_since_epoch())
.count();
}

if (DeviceTime) {
PI_CHECK_ERROR(cuEventSynchronize(event));

float elapsedTime = 0.0f;
PI_CHECK_ERROR(
cuEventElapsedTime(&elapsedTime, _pi_platform::evBase_, event));
*DeviceTime = (uint64_t)(elapsedTime * (double)1e6);
}

return PI_SUCCESS;
}

const char SupportedVersion[] = _PI_CUDA_PLUGIN_VERSION_STRING;

pi_result piPluginInit(pi_plugin *PluginInit) {
Expand Down Expand Up @@ -5634,6 +5664,7 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
_PI_CL(piextKernelSetArgSampler, cuda_piextKernelSetArgSampler)
_PI_CL(piPluginGetLastError, cuda_piPluginGetLastError)
_PI_CL(piTearDown, cuda_piTearDown)
_PI_CL(piGetDeviceAndHostTimer, cuda_piGetDeviceAndHostTimer)

#undef _PI_CL

Expand Down
6 changes: 6 additions & 0 deletions sycl/plugins/cuda/pi_cuda.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ struct _pi_device {
native_type cuDevice_;
std::atomic_uint32_t refCount_;
pi_platform platform_;
pi_context context_;

static constexpr pi_uint32 max_work_item_dimensions = 3u;
size_t max_work_item_sizes[max_work_item_dimensions];
Expand All @@ -103,6 +104,10 @@ struct _pi_device {

pi_platform get_platform() const noexcept { return platform_; };

void set_context(pi_context ctx) { context_ = ctx; };

pi_context get_context() { return context_; };

void save_max_work_item_sizes(size_t size,
size_t *save_max_work_item_sizes) noexcept {
memcpy(max_work_item_sizes, save_max_work_item_sizes, size);
Expand Down Expand Up @@ -178,6 +183,7 @@ struct _pi_context {
bool backend_owns = true)
: kind_{k}, cuContext_{ctxt}, deviceId_{devId}, refCount_{1},
has_ownership{backend_owns} {
deviceId_->set_context(this);
cuda_piDeviceRetain(deviceId_);
};

Expand Down
6 changes: 6 additions & 0 deletions sycl/plugins/esimd_emulator/pi_esimd_emulator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2049,6 +2049,12 @@ pi_result piTearDown(void *) {
return PI_SUCCESS;
}

pi_result piGetDeviceAndHostTimer(pi_device device, uint64_t *deviceTime,
uint64_t *hostTime) {
PiTrace(
"Warning : Querying device clock not supported under PI_ESIMD_EMULATOR");
return PI_SUCCESS;
}
const char SupportedVersion[] = _PI_ESIMD_PLUGIN_VERSION_STRING;

pi_result piPluginInit(pi_plugin *PluginInit) {
Expand Down
54 changes: 46 additions & 8 deletions sycl/plugins/hip/pi_hip.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

#include <algorithm>
#include <cassert>
#include <chrono>
#include <hip/hip_runtime.h>
#include <limits>
#include <memory>
Expand Down Expand Up @@ -605,15 +606,16 @@ pi_uint64 _pi_event::get_start_time() const {
assert(is_started());

PI_CHECK_ERROR(
hipEventElapsedTime(&miliSeconds, context_->evBase_, evStart_));
hipEventElapsedTime(&miliSeconds, _pi_platform::evBase_, evStart_));
return static_cast<pi_uint64>(miliSeconds * 1.0e6);
}

pi_uint64 _pi_event::get_end_time() const {
float miliSeconds = 0.0f;
assert(is_started() && is_recorded());

PI_CHECK_ERROR(hipEventElapsedTime(&miliSeconds, context_->evBase_, evEnd_));
PI_CHECK_ERROR(
hipEventElapsedTime(&miliSeconds, _pi_platform::evBase_, evEnd_));
return static_cast<pi_uint64>(miliSeconds * 1.0e6);
}

Expand Down Expand Up @@ -1988,10 +1990,16 @@ pi_result hip_piContextCreate(const pi_context_properties *properties,
_pi_context::kind::user_defined, newContext, *devices});
}

// Use default stream to record base event counter
PI_CHECK_ERROR(
hipEventCreateWithFlags(&piContextPtr->evBase_, hipEventDefault));
PI_CHECK_ERROR(hipEventRecord(piContextPtr->evBase_, 0));
static std::once_flag initFlag;
std::call_once(
initFlag,
[](pi_result &err) {
// Use default stream to record base event counter
PI_CHECK_ERROR(
hipEventCreateWithFlags(&_pi_platform::evBase_, hipEventDefault));
PI_CHECK_ERROR(hipEventRecord(_pi_platform::evBase_, 0));
},
errcode_ret);

// For non-primary scoped contexts keep the last active on top of the stack
// as `cuCtxCreate` replaces it implicitly otherwise.
Expand Down Expand Up @@ -2021,8 +2029,6 @@ pi_result hip_piContextRelease(pi_context ctxt) {

std::unique_ptr<_pi_context> context{ctxt};

PI_CHECK_ERROR(hipEventDestroy(context->evBase_));

if (!ctxt->is_primary()) {
hipCtx_t hipCtxt = ctxt->get();
// hipCtxSynchronize is not supported for AMD platform so we can just
Expand Down Expand Up @@ -3707,6 +3713,7 @@ pi_result hip_piEventGetProfilingInfo(pi_event event,
switch (param_name) {
case PI_PROFILING_INFO_COMMAND_QUEUED:
case PI_PROFILING_INFO_COMMAND_SUBMIT:
// Note: No user for this case
return getInfo<pi_uint64>(param_value_size, param_value,
param_value_size_ret, event->get_queued_time());
case PI_PROFILING_INFO_COMMAND_START:
Expand Down Expand Up @@ -5208,6 +5215,34 @@ pi_result hip_piTearDown(void *PluginParameter) {
return PI_SUCCESS;
}

pi_result hip_piGetDeviceAndHostTimer(pi_device Device, uint64_t *DeviceTime,
uint64_t *HostTime) {
_pi_event::native_type event;

ScopedContext active(Device->get_context());

if (DeviceTime) {
PI_CHECK_ERROR(hipEventCreateWithFlags(&event, hipEventDefault));
PI_CHECK_ERROR(hipEventRecord(event));
}
if (HostTime) {
using namespace std::chrono;
*HostTime =
duration_cast<nanoseconds>(steady_clock::now().time_since_epoch())
.count();
}

if (DeviceTime) {
PI_CHECK_ERROR(hipEventSynchronize(event));

float elapsedTime = 0.0f;
PI_CHECK_ERROR(
hipEventElapsedTime(&elapsedTime, _pi_platform::evBase_, event));
*DeviceTime = (uint64_t)(elapsedTime * (double)1e6);
}
return PI_SUCCESS;
}

const char SupportedVersion[] = _PI_HIP_PLUGIN_VERSION_STRING;

pi_result piPluginInit(pi_plugin *PluginInit) {
Expand Down Expand Up @@ -5350,10 +5385,13 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
_PI_CL(piextKernelSetArgSampler, hip_piextKernelSetArgSampler)
_PI_CL(piPluginGetLastError, hip_piPluginGetLastError)
_PI_CL(piTearDown, hip_piTearDown)
_PI_CL(piGetDeviceAndHostTimer, hip_piGetDeviceAndHostTimer)

#undef _PI_CL

return PI_SUCCESS;
}

} // extern "C"

hipEvent_t _pi_platform::evBase_{nullptr};
12 changes: 8 additions & 4 deletions sycl/plugins/hip/pi_hip.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ using _pi_stream_guard = std::unique_lock<std::mutex>;
/// when devices are used.
///
struct _pi_platform {
static hipEvent_t evBase_; // HIP event used as base counter
std::vector<std::unique_ptr<_pi_device>> devices_;
};

Expand All @@ -80,6 +81,7 @@ struct _pi_device {
native_type cuDevice_;
std::atomic_uint32_t refCount_;
pi_platform platform_;
pi_context context_;

public:
_pi_device(native_type cuDevice, pi_platform platform)
Expand All @@ -90,6 +92,10 @@ struct _pi_device {
pi_uint32 get_reference_count() const noexcept { return refCount_; }

pi_platform get_platform() const noexcept { return platform_; };

void set_context(pi_context ctx) { context_ = ctx; };

pi_context get_context() { return context_; };
};

/// PI context mapping to a HIP context object.
Expand Down Expand Up @@ -146,11 +152,9 @@ struct _pi_context {
_pi_device *deviceId_;
std::atomic_uint32_t refCount_;

hipEvent_t evBase_; // HIP event used as base counter

_pi_context(kind k, hipCtx_t ctxt, _pi_device *devId)
: kind_{k}, hipContext_{ctxt}, deviceId_{devId}, refCount_{1},
evBase_(nullptr) {
: kind_{k}, hipContext_{ctxt}, deviceId_{devId}, refCount_{1} {
deviceId_->set_context(this);
hip_piDeviceRetain(deviceId_);
};

Expand Down
23 changes: 22 additions & 1 deletion sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5988,7 +5988,10 @@ pi_result piEventGetProfilingInfo(pi_event Event, pi_profiling_info ParamName,
}
case PI_PROFILING_INFO_COMMAND_QUEUED:
case PI_PROFILING_INFO_COMMAND_SUBMIT:
// TODO: Support these when Level Zero supported is added.
// Note: No users for this case
// TODO: Implement commmand submission time when needed,
// by recording device timestamp (using zeDeviceGetGlobalTimestamps)
// before submitting command to device
return ReturnValue(uint64_t{0});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning "0" is still not the right behavior. We should use zeDeviceGetGlobalTimestamps to record the time when commands were physically submitted to the device (inside plugin). I am OK if you just add a TODO comment and not fix it in this PR since that will still be unused (btw, also mention this that there are currently no users of this).

default:
zePrint("piEventGetProfilingInfo: not supported ParamName\n");
Expand Down Expand Up @@ -9354,4 +9357,22 @@ pi_result _pi_buffer::free() {
return PI_SUCCESS;
}

pi_result piGetDeviceAndHostTimer(pi_device Device, uint64_t *DeviceTime,
uint64_t *HostTime) {
const uint64_t &ZeTimerResolution =
Device->ZeDeviceProperties->timerResolution;
const uint64_t TimestampMaxCount =
((1ULL << Device->ZeDeviceProperties->kernelTimestampValidBits) - 1ULL);
uint64_t DeviceClockCount, Dummy;

ZE_CALL(zeDeviceGetGlobalTimestamps,
(Device->ZeDevice, HostTime == nullptr ? &Dummy : HostTime,
&DeviceClockCount));

if (DeviceTime != nullptr) {

*DeviceTime = (DeviceClockCount & TimestampMaxCount) * ZeTimerResolution;
}
return PI_SUCCESS;
}
} // extern "C"
Loading