Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL] Added tests for atomics with various memory orders and scopes #534

Merged
merged 33 commits into from
Feb 24, 2022

Conversation

t4c1
Copy link

@t4c1 t4c1 commented Oct 26, 2021

Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones.

This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.

Copy link

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking I think the tests are good but as mentioned in one of my comments, the new tests may generate kernels with code that may be unsupported on a device and even if the tests intend to skip these kernels they may still be built together with the others which may cause failure at runtime.

This is not unique to 64-bit atomic tests however, as the same could be the case for the various memory orders and memory scopes. This is also why there are so many variants of the atomic_memory_order_*.cpp tests.

Comment on lines 42 to 46
// Include long long tests if they are 64 bits wide
if constexpr (sizeof(long long) == 8) {
add_test_orders_scopes<long long>(q, N);
add_test_orders_scopes<unsigned long long>(q, N);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device associated with the queue must support aspect::atomic64 for these tests to be valid. However, even if you check for that here the compiler may still try to generate the kernels for the 64-bit tests which it will try to build with the other kernels, even though it never intends to run them. This is the reason why many other tests have a _atomic64 test variant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle all possible combinations of supported aspects each of these tests would have to be split into many files (for every combination of supported atomic64, 3 memory scopes and 4 memory orders). That would be a bit too many files in my opinion.

In practice memory orders are currently only supported by sm_70 or higher NVidia devices. These also support all memory scopes and atomic64. That is why I decided to put all new tests into a single file that requires sm_70.

If you have a better suggestion how to handle this, I am happy to change it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the number of combinations becomes quite a headache, but the limitations of order/scope/atomic64 may apply to other devices than just CUDA ones.

An option would be to guard different combinations with macros you then define in different //RUN commands of the tests. This should make sure that the kernels are only generated for the combination test cases, avoiding compilation of invalid kernels by only running them if the device supports the combination of that test case. Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still result in 24 RUN commands per file, but it does allow to merge all test files for each operation into just one.

Although I have to say I do not completely understand what mechanism skips the RUN lines for aspects that are unsupported in the hardware the test is running.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a drawback of this approach; you cannot have e.g. XFAIL for each, but since you have to check for the features at runtime anyway the tests would have to dispatch.

The benefit of splitting them up (even through the macro alternative) is that the tests will not load binaries with unsupported instructions for a given device. That is, the individual RUNs would start out by checking that the corresponding order/scope/atomic64 is supported for the given device, and if not it should skip the test (with a print but not an error.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to do runtime checks? That would result in these tests not being run until intel/llvm#4853 is merged

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to do the runtime checks to make sure the device actually supports the features. CUDA is special in that regard as the SM version controls whether or not a feature is supported, but that's not the case for all devices.

That would result in these tests not being run until intel/llvm#4853 is merged

Since the patch is up I don't see why we wouldn't wait for it. It would make the tests more robust from the get-go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied this kind of changes to tests for add. Can you check this is what you had in mind before I do the rest?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly what I had in mind! It is quite a lot of RUN lines, but I suppose I got what was coming. Maybe it would make sense, just for good measure, to have a comment on each RUN grouping to have a short note specifying which test-configuration it is running.

@bader
Copy link

bader commented Dec 25, 2021

@t4c1, could you resolve merge conflicts, please?

@t4c1 t4c1 requested a review from a team as a code owner January 4, 2022 11:56
bader pushed a commit to intel/llvm that referenced this pull request Jan 18, 2022
Adds implementations for atomic operations that do not have direct PTX equivalents. Sub is implemented by using add. Floating point min and max use compare exchange loops.

I pushed the tests for this into: intel/llvm-test-suite#534
@t4c1
Copy link
Author

t4c1 commented Feb 9, 2022

Are any of the test failures related to the changes in this PR?

@t4c1 t4c1 requested a review from steffenlarsen February 14, 2022 07:40
// RUN: %HOST_RUN_PLACEHOLDER %t.out
// RUN: %GPU_RUN_PLACEHOLDER %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
// RUN: %ACC_RUN_PLACEHOLDER %t.out

// CUDA backend has had no support for the generic address space yet
// XFAIL: cuda || hip
// XFAIL: cuda || hip || level_zero || opencl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment only mentions why CUDA is disabled. Why are the other backends disabled? (Same applies to a handful of other tests)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same reason as for CUDA. Will update the comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ... I might be wrong about that. I need to look into this a bit more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar concern. Today atomic test are split into two parts: + generic pointers part not supported on HIP and CUDA backends.
This separation allows us:

  • to have full validation of all atomic features on OpenCL and Level Zero backends
  • make sure that everything except atomics for generic pointers works on HIP and CUDA
  • (minor) test that atomics for generic pointers fail on HIP and CUDA

New version doesn't have such clear separation. Let's look at "add" operation.

Test Current New
add.cpp everything works everywhere XFAIL on host, opencl and hip
add_generic.cpp XFAIL on hip and cuda XFAIL on hip, cuda, opencl and level_zero
add_atomic64.cpp everything works everywhere moved to add.cpp
add_atomic64_generic.cpp XFAIL on hip and cuda moved to add_generic.cpp

With new test logic distribution, we don't know if anything is working on OpenCL backends, whereas all test cases are passing. It's because new tests for floating point atomics were added to all test files and they are not supported by OpenCL. In general having multiple checks in a single test file, where at least one of the checks fails, XFAIL hides the status of all successful checks. That was the reason to move tests for "generic" pointer to a separate test.
I think it make sense to do for a new test cases which do not pass on all backends.

@t4c1, what do you think?

Copy link
Author

@t4c1 t4c1 Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I might be able to resolve some of these XFAILs. Ones for floating point atomics (on add, sub, min and max) and for missing barrier on host will stay.

This PR is adding many more different subsets of tests that could be put into separate files. Previously there were no tests for atomics in local memory, atomics with non-relaxed memory orders, atomics with non-device scopes. If I followed the current convention and put every combination in a separate file it would get us a huge number of files (global/local generic/AS, 32b/64b, 3 scopes, 4 orders, totaling 64 files per operation).

We already discussed it here: #534 (comment) and as far as I understood this was deemed an acceptable solution. If, however, you have a better suggestion how to handle this, I am happy to change it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay to add new test cases to existing test (e.g. add.cpp) as long as they doesn't change the test status i.e. the test should pass everywhere.

If I followed the current convention and put every combination in a separate file it would get us a huge number of files (global/local generic/AS, 32b/64b, 3 scopes, 4 orders, totaling 64 files per operation).

I think we can put test for features with the same validation status into the same file. Following this logic, I don't think we need a huge number of files. I expect to have 1-2 files per operation. Right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we would still need to split the 3 address spaces and floating point operations, leaving us with 6 files (or 3 for operations without floating point). I guess that is still manageable.

@t4c1
Copy link
Author

t4c1 commented Feb 21, 2022

Can somebody copy the tests results for me again?

@bader
Copy link

bader commented Feb 21, 2022

/verify with intel/llvm#5620

@bader
Copy link

bader commented Feb 21, 2022

Take a look at "Jenkins/verify-test-suite-logs".

@bader bader requested a review from steffenlarsen February 21, 2022 17:59
steffenlarsen
steffenlarsen previously approved these changes Feb 22, 2022
Copy link

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woaw! This has become quite the PR. It looks fine to me.

@alexbatashev - Ping for awareness. This adds a solid handful of tests. Do you have any CI-related concerns?

@bader
Copy link

bader commented Feb 22, 2022

@t4c1, FYI.

SYCL/Basic/event_profiling_info.cpp:28: bool verifyProfiling(sycl::event): Assertion `Submit <= Start' failed.

@t4c1
Copy link
Author

t4c1 commented Feb 22, 2022

This PR does not modify SYCL/Basic/event_profiling_info.cpp test, so I doubt the failure is related to this PR.

@steffenlarsen
Copy link

Event profiling in the CUDA backend has historically been very flaky. I agree with @t4c1 that this is unlikely to be related.

@t4c1 - Would you mind disabling it for CUDA and open a ticket for it?

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
@alexbatashev
Copy link

Woaw! This has become quite the PR. It looks fine to me.

@alexbatashev - Ping for awareness. This adds a solid handful of tests. Do you have any CI-related concerns?

That's a lot of testing, but hopefully it'll be fine

@bader bader merged commit 88ee9d1 into intel:intel Feb 24, 2022
@bader
Copy link

bader commented Feb 24, 2022

@t4c1, it looks like some tests are not working on AMDGPU. See https://github.com/intel/llvm/runs/5319895067?check_suite_focus=true.
Could you fix them as soon as possible, please? We run these tests in pre-commit validation.

@bader
Copy link

bader commented Feb 24, 2022

I've filed intel/llvm#8804 and disabled the failing tests with 08100b0.

@t4c1 t4c1 deleted the atomic_orders_scopes2 branch March 15, 2022 08:52
myler pushed a commit to myler/llvm-test-suite that referenced this pull request Apr 12, 2022
…ntel#534)

Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones.

This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.
myler pushed a commit to myler/llvm-test-suite that referenced this pull request Jun 17, 2022
…ntel#534)

Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones.

This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…ntel/llvm-test-suite#534)

Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones.

This adds tests for changes introduced in intel#4820 and intel#5192.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants