Skip to content

Device global copy kernel implementation #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Per kernel autodiscovery change
-----------------------------------------

Currently each kernel will receive the info about device global address and size

This is potentially not desired, a better design would be to have device global at autodiscovery device level instead, and kernel query for such information during runtime.
  • Loading branch information
sherry-yuan committed Mar 4, 2022
commit ded93034813faf2601e2c110a34d7edf4e65b9e0
2 changes: 2 additions & 0 deletions include/acl.h
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@ typedef struct {
fast_launch_depth; /* How many kernels can be buffered on the device, 0
means no buffering just one can execute*/
unsigned int is_sycl_compile; /* [1] SYCL compile; [0] OpenCL compile*/
unsigned int device_global_address; /* Address of kernel's device global*/
unsigned int device_global_size; /* Size of address space of device global used by this kernel*/
} acl_accel_def_t;

/* An ACL system definition.
Expand Down
24 changes: 24 additions & 0 deletions src/acl_auto_configure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,30 @@ bool acl_load_device_def_from_str(const std::string &config_str,
devdef.accel[i].is_sycl_compile, counters);
}

devdef.accel[i].device_global_address =
0; // Initializing for backward compatability
std::cerr << result << std::endl;
std::cerr << (counters.back() > 0) << std::endl;
if (result && counters.back() > 0) {
std::cerr << "read dev global address" << std::endl;
result = read_uint_counters(config_str, curr_pos,
devdef.accel[i].device_global_address, counters);
}else {
std::cerr << "read dev global address fail" << std::endl;
}


devdef.accel[i].device_global_size =
0; // Initializing for backward compatability
if (result && counters.back() > 0) {
std::cerr << "read dev global size" << std::endl;
result = read_uint_counters(config_str, curr_pos,
devdef.accel[i].device_global_size, counters);
}else {
std::cerr << "read dev global size fail" << std::endl;

}

// forward compatibility: bypassing remaining fields at the end of kernel
// description section
while (result && counters.size() > 0 &&
Expand Down
6 changes: 4 additions & 2 deletions src/acl_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -421,8 +421,10 @@ CL_API_ENTRY cl_int clEnqueueReadGlobalVariableINTEL(
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can insert an assert(kernel) after checking the status to appease Klocwork.

// dev_addr_t dev_global_address =
// kernel->dev_bin->get_devdef().autodiscovery_def.?
uintptr_t dev_global_address = 0x4000000;
uintptr_t dev_global_address = kernel->accel_def->device_global_address;
assert(kernel->accel_def->device_global_address == 4096); // TODO: remove when merging
// uintptr_t dev_global_address = 0x4000000;
// TODO: add checks for whether the copy will be out of bound for device global
void *dev_global_ptr =
(void *)(dev_global_address + offset * 8); // 1 unit of offset is 8 bits
status = set_kernel_arg_mem_pointer_without_checks(kernel, 0, dev_global_ptr);
Expand Down
20 changes: 15 additions & 5 deletions test/acl_auto_configure_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,24 +96,31 @@ TEST(auto_configure, simple) {
#define IS_SYCL_COMPILE " 1"
#define IS_NOT_SYCL_COMPILE " 0"

// device global information
#define KERNEL_DEVICE_GLOBAL_ADDRESS " 4096"
#define KERNEL_DEVICE_GLOBAL_SIZE " 2048"

int parsed;
std::string err_str;
ACL_LOCKED(
parsed = acl_load_device_def_from_str(
std::string(
std::string autodiscovery = std::string(
VERSIONIDTOSTR(ACL_AUTO_CONFIGURE_VERSIONID)
DEVICE_FIELDS RANDOM_HASH
" " BOARDNAME IS_NOT_BIG_ENDIAN MEM HOSTPIPE KERNEL_ARG_INFO_NONE
" 1 82 foo" KERNEL_CRA KERNEL_FAST_LAUNCH_DEPTH KERNEL_PERF_MON
" 1 84 foo" KERNEL_CRA KERNEL_FAST_LAUNCH_DEPTH KERNEL_PERF_MON // 84 = number of kernel field
KERNEL_WORKGROUP_VARIANT KERNEL_WORKITEM_VARIANT
KERNEL_NUM_VECTOR_LANES1 KERNEL_PROFILE_SCANCHAIN_LENGTH
ARGS_LOCAL_GLOBAL_LONG_PROF KERNEL_PRINTF_FORMATSTRINGS
LD_1024 KERNEL_REQD_WORK_GROUP_SIZE_NONE
KERNEL_MAX_WORK_GROUP_SIZE_NONE
KERNEL_MAX_GLOBAL_WORK_DIM_NONE
KERNEL_USES_GLOBAL_WORK_OFFSET_ENABLED
IS_SYCL_COMPILE),
IS_SYCL_COMPILE KERNEL_DEVICE_GLOBAL_ADDRESS KERNEL_DEVICE_GLOBAL_SIZE);
std::cerr << autodiscovery << std::endl;
ACL_LOCKED(
parsed = acl_load_device_def_from_str(
autodiscovery,
m_device_def.autodiscovery_def, err_str));
std::cerr << err_str << std::endl;
CHECK_EQUAL(1, parsed);

CHECK_EQUAL(1, m_device_def.autodiscovery_def.num_global_mem_systems);
Expand Down Expand Up @@ -260,6 +267,9 @@ TEST(auto_configure, simple) {
CHECK_EQUAL(0,
(int)m_device_def.autodiscovery_def.accel[0].max_work_group_size);
CHECK_EQUAL(1, (int)m_device_def.autodiscovery_def.accel[0].is_sycl_compile);
CHECK_EQUAL(4096, (int)m_device_def.autodiscovery_def.accel[0].device_global_address);
CHECK_EQUAL(2048, (int)m_device_def.autodiscovery_def.accel[0].device_global_size);


// Check a second parsing.
// It should allocate a new string for the name.
Expand Down
33 changes: 19 additions & 14 deletions test/acl_globals_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -198,20 +198,25 @@ static std::vector<acl_accel_def_t> acltest_complex_system_device0_accel = {
{},
{32768, 0, 0},
1},
{14,
ACL_RANGE_FROM_ARRAY(acltest_devicelocal[11]),
acltest_kernels[14],
acltest_laspace_info,
{0, 0, 0},
0,
0,
1,
0,
32768,
3,
{},
{32768, 0, 0},
1},
{14, // id
ACL_RANGE_FROM_ARRAY(acltest_devicelocal[11]), // mem
acltest_kernels[14], // iface
acltest_laspace_info, // local_aspaces
{0, 0, 0}, // compile_work_group_size
0, // is_workgroup_invariant
0, // is_workitem_invariant
1, // num_vector_lanes
0, // profiling_words_to_readback
32768, // max_work_group_size
3, // max_global_work_dim
{}, // printf_format_info
{32768, 0, 0}, // max_work_group_size_arr
1, // uses_global_work_offset
0, // fast_launch_depth
1, // is_sycl_compile
4096, // device_global_address
2048, // device_global_size
},
{1,
ACL_RANGE_FROM_ARRAY(acltest_devicelocal[1]),
acltest_kernels[1],
Expand Down