Skip to content

Device global copy kernel implementation #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sherry-yuan
Copy link
Contributor

@sherry-yuan sherry-yuan commented Jan 24, 2022

Currently the device global pointer is being mocked by passing an actual device allocation into the function. However this should be removed after the pointer is available through autodiscovery string.

Closes #56

@sherry-yuan sherry-yuan self-assigned this Jan 24, 2022
@sherry-yuan sherry-yuan added the enhancement New feature or request label Jan 24, 2022
@sherry-yuan sherry-yuan added this to the 2022.3 milestone Jan 24, 2022
@sherry-yuan sherry-yuan force-pushed the dev_global branch 2 times, most recently from a5a6abf to e3ca4d1 Compare January 28, 2022 16:25
@sherry-yuan sherry-yuan force-pushed the dev_global branch 2 times, most recently from fa8b338 to e45add1 Compare February 7, 2022 17:52
@artemrad
Copy link

What is the purpose of the number field in each at each index of the autodiscovery string?

I would also think about how to add fields containing information about properties associated with each device_global.

@sherry-yuan
Copy link
Contributor Author

sherry-yuan commented Feb 14, 2022

Thanks @artemrad for the comment!

What is the purpose of the number field in each at each index of the autodiscovery string?

Autodiscovery format in f89b759: num_dev_global [(num_dev_global_field_1, dev_global_name_1, dev_global_address_1, dev_global_size_1), (num_dev_global_field_2, dev_global_name_2, dev_global_address_2, dev_global_size_2), (...), ...]

By number field you mean the bolded part of above?

The num_dev_global_field keep track of the number of properties associated with each device global. We need this number because the autodiscovery parser relies counting number of remaining item to read per device global. Further, this setup ensure forward compatibility. i.e if we decided to add additional properties in the future for some specific device global, we can easily do so by increasing the num_dev_global_field (which tells the parser there are one more field to read for that specific device global).

If the number of attributes associated with each dev global is always going to be the same, then it probably make more sense to make num_dev_global_field not a per device global number. i.e the format will become

num_dev_global num_dev_global_field [(dev_global_name_1, dev_global_address_1, dev_global_size_1), (dev_global_name_2, dev_global_address_2, dev_global_size_2), (...), ...]

I would also think about how to add fields containing information about properties associated with each device_global.

I think the above mechanism makes sure we can add additional properties for each device global while maintaining forward compatibility? Currently the attributes include dev_global_name, dev_global_address, dev_global_size (there can always be more, let me know if I am missing something). Correct me if I understand attribute wrong.

---------------------------------

Currently the device global pointer is being mocked by passing an actual device allocation into the function. However this should be removed after the pointer is available through autodiscovery string.
-----------------------------------------

Currently each kernel will receive the info about device global address and size

This is potentially not desired, a better design would be to have device global at autodiscovery device level instead, and kernel query for such information during runtime.
----------------------------------------------------------

The autodiscovery specifies how many device global there are in the device

Specify for each device global how many attribute it has (currently 3)

The 3 attribute are: device global name, address, size

The device global name are used in runtime to get address of device global when given the name in clEnqueueReadGlobalVariableINTEL or clEnqueueWriteGlobalVariableINTEL
------------------------------------

Now the autodiscovery format is changed to

num_device_global num_field_in_device_global [(dev_global_name, dev_global_addr, devl_global_size), (dev_global_name, dev_global_addr, devl_global_size), ...]

This is because each device global will have same number of field anyways, so itis not necessary ot keep a number of field for each device global
@sherry-yuan
Copy link
Contributor Author

Update milestone to 2022.4 as that is the current target.

@sherry-yuan sherry-yuan modified the milestones: 2022.3, 2022.4 Mar 14, 2022
if (status != CL_SUCCESS) {
return status;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can insert an assert(kernel) after checking the status to appease Klocwork.

// Host USM example
// void* src_ptr = clHostMemAllocINTEL(m_context, NULL, strsize, 1, &status);
// CHECK_EQUAL(CL_SUCCESS, status);
// CHECK(src_ptr != NULL);
Copy link
Contributor

@pcolberg pcolberg Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code should be removed; unless you intend to enable this before merge?

@sherry-yuan
Copy link
Contributor Author

Autodiscovery change is separated out into a separate PR (because the full implementation depend on updating upstream khronos header, which depend on the full implementation to be ready in sycl runtime). Autodiscovery change could be merged first

Autodiscovery change PR: #100

@pcolberg pcolberg assigned pcolberg and unassigned sherry-yuan Apr 30, 2022
@pcolberg pcolberg assigned sophimao and unassigned pcolberg Jul 6, 2022
@pcolberg pcolberg removed this from the 2023.0 milestone Oct 19, 2022
@sophimao sophimao changed the title Device global implementation Device global copy kernel implementation Dec 15, 2022
@sophimao sophimao assigned intel-liudean and unassigned sophimao Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Device Global Copy Kernel Support
5 participants