Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][USM] Improve USM allocator test and fix improper behavior. #1538

Merged
merged 2 commits into from
May 1, 2020

Conversation

jbrodman
Copy link
Contributor

Signed-off-by: James Brodman james.brodman@intel.com

Signed-off-by: James Brodman <james.brodman@intel.com>
@jbrodman jbrodman requested a review from bader April 16, 2020 19:10
@jbrodman jbrodman requested a review from a team as a code owner April 16, 2020 19:10
@jbrodman jbrodman requested a review from s-kanaev April 16, 2020 19:10
vec.resize(N);

int *res = &vec[0];
int *vals = &vec[0];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not int *res = vec.data()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason.


auto e0 = q.submit([=](handler &h) {
h.single_task<class baz_init>([=]() {
res[0] = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused by this, res and vals point to the same device vector data, so res[0] = 0 is redundant to the first loop iteration below where vals[i=0] = 0. This also means the res[0] == vals[0] is added to itself an extra time, which works because is happens to be 0, but again confusing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see i=0 is skipped in the second loop so no double add. Anyway I know it's just a test, but having res and vals point at the same memory is still confusing to me, takes a minute to verify that it's not a problem in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computation is sort of irrelevant.

@bd4
Copy link

bd4 commented Apr 17, 2020

Testing with inteloneapi v 2021.1-beta05, with usm_allocator.hpp header change manually applied. The test program works for me when SYCL_DEVICE_TYPE=HOST, but fails (nonzero exit code) with CPU and GPU. Added a debug print when there is a mismatch, only the usm::alloc::device case is failing, with value 0 instead of 28.

Note that I am also seeing this behavior in my own usm test programs, with usm_allocator and sycl::malloc_device: waiting on the event (or the queue) is not sufficient to ensure the kernel is complete and it's safe to copy memory back. For example: https://github.com/bd4/dpcpp-test/blob/master/usmmem.cxx#L65

Perhaps there is something I am missing from latest git commits? I will try to build from source and see if I can still reproduce.

@bd4
Copy link

bd4 commented Apr 17, 2020

Perhaps there is something I am missing from latest git commits? I will try to build from source and see if I can still reproduce.

Looks like this was already fixed in #1132, sorry for the noise. Looks like USM is still very much in active development and anyone wanting to use it would be advised to track develop.

@s-kanaev
Copy link
Contributor

@jbrodman What was wrong with previous version of the test?

@jbrodman
Copy link
Contributor Author

@jbrodman What was wrong with previous version of the test?

It was only testing the allocator with host allocations, not device/shared allocations - hence a method on device allocations wasn't working as intended.

@@ -30,35 +30,106 @@ int main() {
auto dev = q.get_device();
auto ctxt = q.get_context();

if (!dev.get_info<info::device::usm_host_allocations>())
if (!(dev.get_info<info::device::usm_host_allocations>() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test you expect that a single device/environment/runtime will provide you with all three types of allocations. Dunno if it's possible that, say, CPU device/runtime provide you with either two out of three allocation facilities. Anyway, I suggest splitting the cases into three distinct ones. One way I see how this can be achieved is to if test blocks with appropriate dev.get_info<info::device::usm_xxx_allocations>().

@s-kanaev
Copy link
Contributor

@jbrodman ping.

@jbrodman
Copy link
Contributor Author

Sorry, what exactly is the request again?

@s-kanaev
Copy link
Contributor

This one: #discussion_r411543369

Signed-off-by: James Brodman <james.brodman@intel.com>
@jbrodman jbrodman requested a review from s-kanaev April 28, 2020 22:33
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader
Copy link
Contributor

bader commented Apr 29, 2020

@jbrodman, @bd4, @Ruyk, I marked some comments as resolved, but there are a couple of them which might require attention. Are you okay to merge or you expect additional fixes?

@bader
Copy link
Contributor

bader commented May 1, 2020

@jbrodman, @bd4, @Ruyk, I marked some comments as resolved, but there are a couple of them which might require attention. Are you okay to merge or you expect additional fixes?

Ping.

@bd4
Copy link

bd4 commented May 1, 2020

@bader looks good to me. I ran the tests with some added debug prints against beta 05. With SYCL_DEVICE_TYPE=GPU, the usm::alloc::device case fails, other two pass. For HOST and CPU, all pass.

In beta06, which appears to have #1132 but not this, the tests almost pass but the feature not supported exception is thrown on device vector destructor. If I comment out the throw at /opt/intel/inteloneapi/compiler/2021.1-beta06/linux/include/sycl/CL/sycl/usm/usm_allocator.hpp line 88 and rebuild, then the test runs successfully.

Using my source build llvm install with this fix applied (and the already merged queue fix from #1132), all pass.

@bader
Copy link
Contributor

bader commented May 1, 2020

@bd4, thanks!

@bader bader merged commit bb8fce9 into intel:sycl May 1, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request May 6, 2020
…_docs

* origin/sycl: (6482 commits)
  [SYCL][NFC] Clean formatting in Markdown documents (intel#1635)
  [SYCL][Doc] Remove obsolete parens from README (intel#1637)
  [SYCL] Fix failing ABI tests when LLVM_LIBDIR_SUFFIX is set (intel#1605)
  [SYCL] Fix warnings in libdevice (intel#1630)
  [SYCL][CUDA] Triage and clean LIT (intel#1620)
  [SYCL][NFC] Fix GCC 8 compilation warnings (intel#1631)
  [SYCL] Minor fixes in LowerWGScope
  [SYCL] PI: correct default interoperability plugin selection
  [SYCL] Add faster reduction implementations using atomic or/and intel::reduce() (intel#1615)
  [SYCL] Add sycl-ls utility for listing devices discovered/selected by SYCL RT (intel#1575)
  [SYCL] Fix getDeviceFromHandler declarations (intel#1626)
  [SPIR-V] Correct/improve declaration of SPIR-V builtins (intel#1519)
  [SYCL][USM] Improve USM allocator test and fix improper behavior. (intel#1538)
  [SYCL] Fix failing ABI LITs (intel#1622)
  [SYCL] Add support for MSVC internal math functions in device library (intel#1441)
  [SYCL] Add runtime library versioning (intel#1604)
  [SYCL] Check weak symbols in ABI dumps (intel#1609)
  [NFC][SYCL] Improve kernel metadata test (intel#1610)
  Revert "[SYCL] XFAIL LIT test due to duplicate diagnostic" (intel#1460)
  [SYCL] Move the reduction command group funcs out of handler.hpp (intel#1602)
  ...
MartinWehking pushed a commit to MartinWehking/llvm that referenced this pull request May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants