-
Notifications
You must be signed in to change notification settings - Fork 131
[SYCL] Added tests for atomics with various memory orders and scopes #534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking I think the tests are good but as mentioned in one of my comments, the new tests may generate kernels with code that may be unsupported on a device and even if the tests intend to skip these kernels they may still be built together with the others which may cause failure at runtime.
This is not unique to 64-bit atomic tests however, as the same could be the case for the various memory orders and memory scopes. This is also why there are so many variants of the atomic_memory_order_*.cpp
tests.
SYCL/AtomicRef/add_orders_scopes.cpp
Outdated
// Include long long tests if they are 64 bits wide | ||
if constexpr (sizeof(long long) == 8) { | ||
add_test_orders_scopes<long long>(q, N); | ||
add_test_orders_scopes<unsigned long long>(q, N); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device associated with the queue must support aspect::atomic64
for these tests to be valid. However, even if you check for that here the compiler may still try to generate the kernels for the 64-bit tests which it will try to build with the other kernels, even though it never intends to run them. This is the reason why many other tests have a _atomic64
test variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle all possible combinations of supported aspects each of these tests would have to be split into many files (for every combination of supported atomic64, 3 memory scopes and 4 memory orders). That would be a bit too many files in my opinion.
In practice memory orders are currently only supported by sm_70 or higher NVidia devices. These also support all memory scopes and atomic64. That is why I decided to put all new tests into a single file that requires sm_70.
If you have a better suggestion how to handle this, I am happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the number of combinations becomes quite a headache, but the limitations of order/scope/atomic64 may apply to other devices than just CUDA ones.
An option would be to guard different combinations with macros you then define in different //RUN
commands of the tests. This should make sure that the kernels are only generated for the combination test cases, avoiding compilation of invalid kernels by only running them if the device supports the combination of that test case. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would still result in 24 RUN commands per file, but it does allow to merge all test files for each operation into just one.
Although I have to say I do not completely understand what mechanism skips the RUN lines for aspects that are unsupported in the hardware the test is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a drawback of this approach; you cannot have e.g. XFAIL for each, but since you have to check for the features at runtime anyway the tests would have to dispatch.
The benefit of splitting them up (even through the macro alternative) is that the tests will not load binaries with unsupported instructions for a given device. That is, the individual RUNs would start out by checking that the corresponding order/scope/atomic64 is supported for the given device, and if not it should skip the test (with a print but not an error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to do runtime checks? That would result in these tests not being run until intel/llvm#4853 is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to do the runtime checks to make sure the device actually supports the features. CUDA is special in that regard as the SM version controls whether or not a feature is supported, but that's not the case for all devices.
That would result in these tests not being run until intel/llvm#4853 is merged
Since the patch is up I don't see why we wouldn't wait for it. It would make the tests more robust from the get-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied this kind of changes to tests for add. Can you check this is what you had in mind before I do the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what I had in mind! It is quite a lot of RUN lines, but I suppose I got what was coming. Maybe it would make sense, just for good measure, to have a comment on each RUN grouping to have a short note specifying which test-configuration it is running.
# Conflicts: # SYCL/AtomicRef/add.cpp # SYCL/AtomicRef/add.h # SYCL/AtomicRef/add_atomic64.cpp # SYCL/AtomicRef/compare_exchange.h # SYCL/AtomicRef/exchange.h # SYCL/AtomicRef/max.h # SYCL/AtomicRef/min.h
…test-suite into atomic_orders_scopes2
@t4c1, could you resolve merge conflicts, please? |
Adds implementations for atomic operations that do not have direct PTX equivalents. Sub is implemented by using add. Floating point min and max use compare exchange loops. I pushed the tests for this into: intel/llvm-test-suite#534
Are any of the test failures related to the changes in this PR? |
SYCL/AtomicRef/add_generic.cpp
Outdated
// RUN: %HOST_RUN_PLACEHOLDER %t.out | ||
// RUN: %GPU_RUN_PLACEHOLDER %t.out | ||
// RUN: %CPU_RUN_PLACEHOLDER %t.out | ||
// RUN: %ACC_RUN_PLACEHOLDER %t.out | ||
|
||
// CUDA backend has had no support for the generic address space yet | ||
// XFAIL: cuda || hip | ||
// XFAIL: cuda || hip || level_zero || opencl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment only mentions why CUDA is disabled. Why are the other backends disabled? (Same applies to a handful of other tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same reason as for CUDA. Will update the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually ... I might be wrong about that. I need to look into this a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have similar concern. Today atomic test are split into two parts: + generic pointers part not supported on HIP and CUDA backends.
This separation allows us:
- to have full validation of all atomic features on OpenCL and Level Zero backends
- make sure that everything except atomics for generic pointers works on HIP and CUDA
- (minor) test that atomics for generic pointers fail on HIP and CUDA
New version doesn't have such clear separation. Let's look at "add" operation.
Test | Current | New |
---|---|---|
add.cpp | everything works everywhere | XFAIL on host, opencl and hip |
add_generic.cpp | XFAIL on hip and cuda | XFAIL on hip, cuda, opencl and level_zero |
add_atomic64.cpp | everything works everywhere | moved to add.cpp |
add_atomic64_generic.cpp | XFAIL on hip and cuda | moved to add_generic.cpp |
With new test logic distribution, we don't know if anything is working on OpenCL backends, whereas all test cases are passing. It's because new tests for floating point atomics were added to all test files and they are not supported by OpenCL. In general having multiple checks in a single test file, where at least one of the checks fails, XFAIL hides the status of all successful checks. That was the reason to move tests for "generic" pointer to a separate test.
I think it make sense to do for a new test cases which do not pass on all backends.
@t4c1, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I might be able to resolve some of these XFAILs. Ones for floating point atomics (on add, sub, min and max) and for missing barrier on host will stay.
This PR is adding many more different subsets of tests that could be put into separate files. Previously there were no tests for atomics in local memory, atomics with non-relaxed memory orders, atomics with non-device scopes. If I followed the current convention and put every combination in a separate file it would get us a huge number of files (global/local generic/AS, 32b/64b, 3 scopes, 4 orders, totaling 64 files per operation).
We already discussed it here: #534 (comment) and as far as I understood this was deemed an acceptable solution. If, however, you have a better suggestion how to handle this, I am happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay to add new test cases to existing test (e.g. add.cpp) as long as they doesn't change the test status i.e. the test should pass everywhere.
If I followed the current convention and put every combination in a separate file it would get us a huge number of files (global/local generic/AS, 32b/64b, 3 scopes, 4 orders, totaling 64 files per operation).
I think we can put test for features with the same validation status into the same file. Following this logic, I don't think we need a huge number of files. I expect to have 1-2 files per operation. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we would still need to split the 3 address spaces and floating point operations, leaving us with 6 files (or 3 for operations without floating point). I guess that is still manageable.
Can somebody copy the tests results for me again? |
/verify with intel/llvm#5620 |
Take a look at "Jenkins/verify-test-suite-logs". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woaw! This has become quite the PR. It looks fine to me.
@alexbatashev - Ping for awareness. This adds a solid handful of tests. Do you have any CI-related concerns?
@t4c1, FYI.
|
This PR does not modify |
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
That's a lot of testing, but hopefully it'll be fine |
@t4c1, it looks like some tests are not working on AMDGPU. See https://github.com/intel/llvm/runs/5319895067?check_suite_focus=true. |
I've filed intel/llvm#8804 and disabled the failing tests with 08100b0. |
…ntel#534) Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones. This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.
…ntel#534) Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones. This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.
…ntel/llvm-test-suite#534) Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones. This adds tests for changes introduced in intel#4820 and intel#5192.
Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones.
This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.